Lessons Learned while Working on Large-Scale Server Software

Working on large scale software comes with its own unique set of challenges. Here’s a set of tips to remember when faced with a mammoth challenge. While the entire article is a longer and better read; here are some distilled points.

  • Plan for the worst. Creating a baseline for the worst thing that can happen is comforting. It also helps us plan for failure (because failure is inevitable).
  • Don’t trust the network. We often take the network for granted. But network latency and flakiness can be a source of immense pain because your production system behaviour doesn’t match your localhost.
  • Crash-first software. Big, loud crashes bring a developers attention to the problem faster, thereby helping them fix the bug. Silent failures fester in systems far longer and crop up at the most inopportune moments.
  • People are the lynchpin of any system. A lot of success of large scale software depends on how people react to failure. This problem is magnified when a senior engineer leaves the team or new members join the team. Default to using tools and  processes to ensure that tribal knowledge is codified and persistent.

Full post here, 11 mins read