#devpractices
34 posts

Lesson learned while working on large-scale server software

Always have a plan for worst-case scenarios for error conditions, and find a general solution, such as automatically shutting down all operations and return an error code with when to retry or a contact to call.
Read more

Lesson learned while working on large-scale server software

  • Always have a plan for worst-case scenarios for error conditions, and find a general solution, such as automatically shutting down all operations and return an error code with when to retry or a contact to call.
  • Document your decision making and add idempotence wherever possible.
  • Approach debugging in a scientific way: first, gather data, form the right hypothesis and design an experiment to prove it, and then apply your fix; use tools to dig deep into traces and memory without stopping the system.
  • Impose a strict implementation of Postel’s Law: Be conservative in what you send, be liberal in what you accept.
  • Be wary of a major deployment that seems to go smoothly. Errors are inevitable and the bigger and quieter the error, the more dangerous they are. If you are not sure how to handle an error, let the system crash. It makes it easy to catch and correct errors.
  • Be prepared to restart the entire system from a blank slate under heavy load.
  • Notice technical decisions and components that have global effects, not just global variables.
  • Build channels for persistent communication as new people are onboarded and leave teams. When building systems, do not assume operators will do things correctly and give them the tools to undo mistakes.

Full post here, 10 mins read

Three types of risk: making decisions in the face of uncertainty

Fatal risks can kill a company, so you need to be extra careful to watch for them as your company grows and ages. It is easy to get complacent but there is no recovery from such decisions gone wrong.
Read more

Three types of risk: making decisions in the face of uncertainty

  • Fatal risks can kill a company, so you need to be extra careful to watch for them as your company grows and ages. It is easy to get complacent but there is no recovery from such decisions gone wrong.
  • Painful risks have lower but significant repercussions: missing a key goal or losing key people. But you can recover from them.
  • Embarrassment risks have no significant impact beyond just what the name states. You just need to acknowledge the mistake, change course and move on.
  • Another way to think about risk is Jeff Bezos categorization of decision making into Type 1 (irreversible, so spend time on them) and Type 2 (reversible, as with painful and embarrassing risks, so make the decision quickly and move on).

Full post here, 4 mins read

Growing your tech stack: when to say no

Deployment infrastructure tools (monitoring, logging, provisioning, building executables, deploying) are a moderate risk. They automate tasks for the whole team, which means the whole team (and deployment) stops if it breaks.
Read more

Growing your tech stack: when to say no

  • Local developer utilities (one-person programs, testing tools) are very low-risk. They run on your own machine, though a whole team can adopt them, boosting productivity. If not widely used, switching environments is harder and you might compromise uniformity, but it is often a worthwhile trade-off.
  • Deployment infrastructure tools (monitoring, logging, provisioning, building executables, deploying) are a moderate risk. They automate tasks for the whole team, which means the whole team (and deployment) stops if it breaks. But they reduce risk in production/deployment compared to manual set-up and troubleshooting. They constitute a hot area for development and you risk falling behind your competition without them.
  • A new programming language is also a moderate risk. Each language calls for new build tools, libraries, dependency management, packaging, test frameworks, internal DSLs. More than one person must learn them, or you get code no one but the developer understands. Getting your team on board, fast becomes your responsibility. You can mitigate the risk by carefully isolating the experimental code so that it becomes replaceable. Consider the tooling and documentation available before you select a language and whether other people have integrated it into a stack like yours (and written about it). The more languages you use, the greater the cognitive overheads when debugging.
  • Adding a new database is a serious risk. A stateful system of record is critical to your business: if it goes down, you cannot just rewrite it, business stops until you can do a migration. In the worst-case scenario, you lose data. You can mitigate this risk with replication (backup and restore) and migration automation, which you should integrate and test before data enters the system. You need a dedicated team to maintain the database (monitoring, updating, re-provisioning) as load increases.

Full post here , 13 mins read

Avoid rewriting a legacy system from scratch, by strangling it

There comes a point when there is so much technical debt in your legacy project that you can no longer implement new features. Yet rewriting from scratch is risky and refactoring is expensive.
Read more

Avoid rewriting a legacy system from scratch, by strangling it

  • There comes a point when there is so much technical debt in your legacy project that you can no longer implement new features. Yet rewriting from scratch is risky and refactoring is expensive.
  • ‘Strangle’ the codebase instead, progressively deleting the old codebase in favor of building a new one. It has less risk of breaking things and is less work overall.
  • To strangle the codebase:
  1. Write new code that acts as a proxy for the old code: users come to the new system, which actually redirects to the old one behind the scenes.
  2. Build new modules to re-implement each of the legacy behaviors in the new codebase, such that there is no change from the user perspective.
  3. Progressively fade away the old code from use until users are entirely consuming the new modules.
  4. Delete the old, unused code.
  • Use techniques such as wrap classes to add new behaviours to the system without changing existing code at first, which also separates new and old responsibilities while integrating them. Or you can use the domain-driven design (DDD), with the legacy system as a black box and building a bubble context where you apply DDD, interacting with the black box through an anti-corruption layer. Roll out the rewrites progressively.

Full post here, 5 mins read

Reasons why you should be coding with interfaces

Interfaces enable you to code against pure abstractions, not implementations. Pure abstraction is the best thing for loose code coupling.
Read more

Reasons why you should be coding with interfaces

  • Interfaces enable you to code against pure abstractions, not implementations. Pure abstraction is the best thing for loose code coupling.
  • Interfaces let you make the coupling between classes also very loose, which lets you alter your implementations at runtime more easily, in turn allowing you to create very pluggable implementations for when you think of a better way to do something without having to change your code radically.
  • Interfaces allow for good inter-module communications, especially useful when you have different teams working on different modules - this way, you don’t need to know how the other team’s code works to interface with it.
  • Interfaces make your code more testable because you can readily substitute implementations, even a fake one. This lets you test using, for example, a fake database returning canned data instead of actually having to connect to and use an active, vulnerable database.
  • Interfaces are useful for implementing design patterns such as Model-View-Controller (MVC) and Model-View-ViewModel, and for dependency injection frameworks.

Full post here, 6 mins read