#logging
3 posts

Distributed Logging Architecture in the Container Era

Logging is a cross-cutting concern in any application.
Read more

Distributed Logging Architecture in the Container Era

  • Logging is a cross-cutting concern in any application. For distributed application, it's better to have shared logging technology across all the services. Log aggregators are a solution for polyglot systems as they have connectors to most languages.
  • Logging infrastructure must be searchable. What's the point of logging everything without being able to answer queries like "Which service throws the most errors?"
  • Using a single correlation ID across services allows you to filter log messages from all the sources. This makes debugging a lot easier if you can view all the code paths that a request touched.
  • Include a lot more context in each log message. Data such as username, service name, timestamp etc are very handy to scan through logs quickly.
  • Network failure is an inevitability when dealing with log aggregators. Some of the ways to handle this are using local disk to log or a fallback service.
  • With privacy and GDPR in full force, be careful to not log personally identifiable information.

Full Post here, 9 mins read

Do not log

Logging is an important aspect of any software system. This post focuses on the perils of logging and proposes better ways of achieving end results. Although the author uses Haskell for examples, the principles are global.
Read more

Do not log

Logging is an important aspect of any software system. This post focuses on the perils of logging and proposes better ways of achieving end results. Although the author uses Haskell for examples, the principles are global.

  • Using logs to monitor production systems is a fallacy. Use of better error tracking and monitoring products like Prometheus or Sentry to track business metrics leads to better systems. If the business metric isn't affected, does it matter that an error occurred?
  • Logging is a side effect that can also fail. The juice is not worth the squeeze.
  • Storing and Grepping through logs in a centralized location is a sub-system on its own. It's one more failure point in your architecture design.
  • Any error log should include the complete business context of the object or activity that failed. Simply logging "Error occurred" is futile. Better error logging leads to the reproducibility of the error by the engineering team.

I think logging is a very nuanced subject that takes years for teams to understand and coalesce around. Good logging principles are not born but carefully farmed within a team through many years of trial and error.

Full post here, 11 mins read

What I talk about when I talk about logging

Separate production and logging (collecting, handling and archiving) so that log analysis does not create an additional load on production systems and also, logs are safeguarded from attackers trying to hide their trail.
Read more

What I talk about when I talk about logging

  • Analyzing logs is as, or more, important than logging. Only log what you intend to analyze.
  • Separate production and logging (collecting, handling and archiving) so that log analysis does not create an additional load on production systems and also, logs are safeguarded from attackers trying to hide their trail.
  • Transport logs to a centralized log server with appropriate access rights and archiving policies. Also, preserve the logs as raw as possible for later analysis and do not aggregate them in earlier phases.
  • Before log analysis, ensure you have created a clear understanding of your system’s baseline behavior. You will then know what to log, how long to retain the logs, and can add flexible tools to help you analyze the logs quickly and effectively in any format.
  • Enable automated reporting of event occurrences after setting baselines and thresholds. This way, you will be sure to look at logs whenever something important transpires.

Full post here, 6 mins read