#Issue141
2 posts

10 Tips for Debugging in Production

This is an interesting story of how this team solved a tricky bug. What followed was a checklist of things we can all follow to debug production code.
Read more

10 Tips for Debugging in Production

Heisenbugs
Courtesy: Geek and Poke

This is an interesting story of how this team solved a tricky bug. What followed was a checklist of things we can all follow to debug production code.

  • Eliminate all the improbable scenarios immediately. In order to do this, it's perfectly fine to add temporary debug code to your production deployment.
    Just remember to remove it after.
  • Don't be afraid to ask co-workers for advice. It's likely that they've seen something like this before. Experience trumps everything when you're stuck.
  • Read, read and then re-read the logs. In my personal experience, lot's of people simply skim the logs and miss out important details in the logs.
  • Don't ignore dependent libraries as potential sources of problems. After all, that is code written by another human.

Lastly, The fact that you had to debug in production means that you couldn't reproduce the error on your local machine. Once all fires are extinguished, spend time fixing this problem.

Full Post here, 15 mins read

Distributed Logging Architecture in the Container Era

Logging is a cross-cutting concern in any application.
Read more

Distributed Logging Architecture in the Container Era

  • Logging is a cross-cutting concern in any application. For distributed application, it's better to have shared logging technology across all the services. Log aggregators are a solution for polyglot systems as they have connectors to most languages.
  • Logging infrastructure must be searchable. What's the point of logging everything without being able to answer queries like "Which service throws the most errors?"
  • Using a single correlation ID across services allows you to filter log messages from all the sources. This makes debugging a lot easier if you can view all the code paths that a request touched.
  • Include a lot more context in each log message. Data such as username, service name, timestamp etc are very handy to scan through logs quickly.
  • Network failure is an inevitability when dealing with log aggregators. Some of the ways to handle this are using local disk to log or a fallback service.
  • With privacy and GDPR in full force, be careful to not log personally identifiable information.

Full Post here, 9 mins read