Lesson learned while working on large-scale server software

Always have a plan for worst-case scenarios for error conditions, and find a general solution, such as automatically shutting down all operations and return an error code with when to retry or a contact to call.
Read more

Lesson learned while working on large-scale server software

  • Always have a plan for worst-case scenarios for error conditions, and find a general solution, such as automatically shutting down all operations and return an error code with when to retry or a contact to call.
  • Document your decision making and add idempotence wherever possible.
  • Approach debugging in a scientific way: first, gather data, form the right hypothesis and design an experiment to prove it, and then apply your fix; use tools to dig deep into traces and memory without stopping the system.
  • Impose a strict implementation of Postel’s Law: Be conservative in what you send, be liberal in what you accept.
  • Be wary of a major deployment that seems to go smoothly. Errors are inevitable and the bigger and quieter the error, the more dangerous they are. If you are not sure how to handle an error, let the system crash. It makes it easy to catch and correct errors.
  • Be prepared to restart the entire system from a blank slate under heavy load.
  • Notice technical decisions and components that have global effects, not just global variables.
  • Build channels for persistent communication as new people are onboarded and leave teams. When building systems, do not assume operators will do things correctly and give them the tools to undo mistakes.

Full post here, 10 mins read

Being a great engineering mentor

Be ready to listen. Your mentee should always feel free to ask you questions, especially in the first few weeks of onboarding.
Read more

Being a great engineering mentor

  • Be ready to listen. Your mentee should always feel free to ask you questions, especially in the first few weeks of onboarding.
  • Help anchor them socially in the new place. If possible, carve out a weekly team ritual.
  • Let them feel safe enough to pursue novel solutions by showing them how to evaluate risks within the context of your organization and team.
  • Explicitly inform your mentee of technical safeguards in place as well as providing non-technical assurance of help with failures.
  • Understand their strengths and weaknesses, and encourage as well as challenge them technically with tasks that let them level up while having fun, gradually adding more unfamiliar or challenging jobs. Offer code hints but don’t unnecessarily prime them with warnings of difficulties ahead, and occasionally add stretch goals and praise their resolution to build confidence.
  • Have 1-on-1 discussions and reinforce when you notice they are doing a great job while challenging them to do even better and share business context they might not readily see, as well as the engineering context of their project predating their on-boarding.

Full post here, 8 mins read

How to feel less overwhelmed as a developer

Identify the specific problem - be it too steep a learning curve, too much information incoming, too many responsibilities, peer pressure or your own expectations of yourself.
Read more

How to feel less overwhelmed as a developer

  • When you feel overwhelmed, either there is too much going on at once or you are overstimulated. Refocus and reprioritize.
  • Identify the specific problem - be it too steep a learning curve, too much information incoming, too many responsibilities, peer pressure or your own expectations of yourself.
  • Zero in on your key goals, set your boundaries, focus your ambitions, and recognize what’s not really relevant to you.
  • Find a process for self-education. Don’t try to memorize everything, learn where to find the right information. Make a list of what you don’t know, and add to it every time you come across a new idea or a skill you don’t have. Sift through the content and establish your key resources so you aren’t overwhelmed.
  • Beware of getting overwhelmed by other people. Make sure you have a balanced perspective on social pressure. Know that a lot of people write bad code, even great developers in great companies. People have different priorities and you don’t have to keep up with theirs. Remember that people will write about what is possible but you don’t need most of it on a day-to-day basis.
  • Work smarter by spending time on core skills like problem-solving, critical thinking and testing. Use proper project management tools to plan, manage tasks and track bugs. Take breaks for fresh air, exercise, and conversation so you don’t lose sight of the big picture. Ask your community for help with good resources, pointers or support with your workload.

Full post here, 9 mins read

Don’t be a jerk: write documentation

Documentation can be minimal and yet helpful. Take whatever you have in volatile formats (email, chat logs, shell I/O) and paste it into more durable ones (README files, diagrams, websites, FAQs, wikis).
Read more

Don’t be a jerk: write documentation

  • Documentation can be minimal and yet helpful. Take whatever you have in volatile formats (email, chat logs, shell I/O) and paste it into more durable ones (README files, diagrams, websites, FAQs, wikis).
  • Document based on your audience:
  1. For newcomers, focus on what it does, why you wrote it, who should be using it and for what, how it should be used (how to build, configure, run and get started), and where additional information may be found; do not display the source.
  2. Regular users could do with a reference manual, examples, EDoc/JavaDoc/Doc, wiki or website, and API descriptions.
  3. Contributors should have access to the source repository, as well as project structure and architecture (where to find out about specific functionalities and where new features should go), the project principles, tests, issues, and a roadmap.
  • An easy way to get started is to imagine you are talking to a new user. Then vary the user and add different scenarios (of their business context and experience/knowledge, budgets) to expand the documentation until it is comprehensive. Finally, you might split it up into categories for different people or reimagine/format it for different media/platforms.
  • Another approach is to find a problem a user might face and show them how to solve it.

Full post here, 11 mins read

Three types of risk: making decisions in the face of uncertainty

Fatal risks can kill a company, so you need to be extra careful to watch for them as your company grows and ages. It is easy to get complacent but there is no recovery from such decisions gone wrong.
Read more

Three types of risk: making decisions in the face of uncertainty

  • Fatal risks can kill a company, so you need to be extra careful to watch for them as your company grows and ages. It is easy to get complacent but there is no recovery from such decisions gone wrong.
  • Painful risks have lower but significant repercussions: missing a key goal or losing key people. But you can recover from them.
  • Embarrassment risks have no significant impact beyond just what the name states. You just need to acknowledge the mistake, change course and move on.
  • Another way to think about risk is Jeff Bezos categorization of decision making into Type 1 (irreversible, so spend time on them) and Type 2 (reversible, as with painful and embarrassing risks, so make the decision quickly and move on).

Full post here, 4 mins read

Four magic numbers for measuring software delivery

Lead time, deployment frequency, change failure percentage, mean time to recovery/resolution..
Read more

Four magic numbers for measuring software delivery

  • Lead time to validate, design, implement and ship a new valuable thing. Consider two types of lead time: (a) feature lead time: the time to move an item from high-level requirements to feature release and (b) deployment lead time: the time from merging to master to the component being in production.
  • Deployment frequency, understood as number of times per developer per day you ship to production. Your deployment frequency may ebb and flow, from 5-10 on a busy day to none the rest of the week. Establish a baseline over 4 weeks: say, 1 production deployment per day on average.
  • Change failure percentage, ie, the proportion of red deployments, bugs, alerts, etc. Defining change failure rate as bugs per production deployment over a given period, aim for about 10% or less with medium or high-priority of customer impact.
  • Mean time to recovery/resolution. For mean time to resolution, aim for less than a working week.
  • Considering feature lead time as a key performance indicator can help you break features up and deliver faster. This might also decrease lead time per deployment.
  • Convert generic support tickets into bugs with customer impact implications so they can be tracked better. Make bugs more visible can also make bottlenecks in the resolution process more apparent.
  • Count red production deployments and production alerts as a change failure.

Full post here, 9 mins read

Inefficient efficiency

Latency (measured in time units) is the time between a stimulus and a response, while throughput (measured in deliveries per time unit) is the rate at which the system meets its goals.
Read more

Inefficient efficiency

  • Latency (measured in time units) is the time between a stimulus and a response, while throughput (measured in deliveries per time unit) is the rate at which the system meets its goals.
  • Sometimes latency and throughput interfere with each other. Do you field a request and respond before taking the next request (low latency for the first customer but overall lower throughput), or do you accept the second request while processing the first for higher throughput?
  • You make latency/throughput tradeoffs every day and most of us are biased towards throughput. For example - you carefully plan all foreseeable architectural improvements instead of initiating the first profitable change you come across.
  • Instead, you should optimize for latency. For example - if preferences are likely to change between requests (high rate of change), so you can adapt, which is less wasteful.
  • To decide between throughput and latency, consider the cost of delay, whether you might learn something that changes your approach subsequently, and whether external factors might force a new approach.

Full post here, 4 mins read

Tips on API monitoring

Track your functional uptime with comprehensive, end-to-end testing for both functionality and performance. Simple ping tests are usually not enough to meet your service level agreements (SLAs).
Read more

Tips on API monitoring

  • Track your functional uptime with comprehensive, end-to-end testing for both functionality and performance. Simple ping tests are usually not enough to meet your service level agreements (SLAs).
  • Since 95% of API vulnerabilities are due to human error, add monitoring at 5-minute intervals for breaches and downtime. Integrate automated testing into every step of your CI/CD pipeline to filter out human errors and make sure you have load-testing capabilities too.
  • But you should beware of

- tools that perform ‘synthetic testing’ and cannot reproduce actual consumer flows.

- tools that use third-party clouds, adding another layer of insecurity to your API (have internal APIs use on-premise tools instead).

- having separate testing and monitoring solutions.

- tests that are not detailed enough for intelligent results.

Full post here, 4 mins read

Five big myths surrounding technical debt

Myth #1 is that tech debt is amorphous, when in actuality it is usually a collection of many discrete problems. This means you can quantify tech debt on a spreadsheet or tracker and then prioritize based on impact.
Read more

Five big myths surrounding technical debt

  • Myth #1 is that tech debt is amorphous, when in actuality it is usually a collection of many discrete problems. This means you can quantify tech debt (frequency, impact, cost to fix) on a spreadsheet or tracker and then prioritize based on impact.
  • Myth #2 is that all tech debt has a single cause. In fact, there are three major sets of causes (or enablers):

a) Intentional tech debt, where the team knowingly took shortcuts to meet deadlines.

b) Unintentional, where poor code results from inexperience or incompetence or misunderstood requirements.

c) From age, where incremental enhancements and bug fixes cause a bit rot.

  • Myth #3 is that you should never intentionally take on tech debt. But there are three situations where it may be justified: you are building a prototype or demo module to solicit feedback on a new feature or product, time to market is critical or you are building based on guesswork and anticipate client requirements will change sooner than later.
  • Myth #4 is it only hurts engineers. In fact, it affects developers, product managers, customer support teams and customers, technical operations, and any executive team you may set up.
  • Myth #5 is that only a full rewrite can repay the debt, whereas more often, a detailed plan of incremental fixes will solve it.

Full post here, 8 mins read

Indirect benefits of building API-first

A major benefit of building API-first is agility, both digital and strategic. You are not constrained to a single usage pattern and have the basis for a variety of applications with use cases you can’t even imagine yet, offering both flexibility and reusability.
Read more

Indirect benefits of building API-first

  • A major benefit of building API-first is agility, both digital and strategic. You are not constrained to a single usage pattern and have the basis for a variety of applications with use cases you can’t even imagine yet, offering both flexibility and reusability.
  • Your end-user has a better experience since APIs encourage interactivity across internal and external applications, creating a stickier, unified experience for consumers.
  • Building your organization around APIs creates a potential for partnerships, big and small, whether using the marketplace model (your platform allows others to build and distribute custom add-ons, in turn extending your own platform) or one where partner APIs share data with a common, collaborative goal.
  • Outward-facing APIs enable the growth and inception of self-sustaining developer communities, providing you with awareness, API support, product suggestions and more.

Full post here, 4 mins read

Feedback is not a dirty word

Without a good feedback system, pockets of disagreement can grow into resentment, distrust and eventually organizational failure. Feedback is also the only way to achieve true personal growth.
Read more

Feedback is not a dirty word

  • Without a good feedback system, pockets of disagreement can grow into resentment, distrust and eventually organizational failure. Feedback is also the only way to achieve true personal growth.
  • For critical feedback to be effective, give it in private and with positive intentions, be specific and factual, and use a non-violent communication format: ‘when you do , I feel because the story in my head is ’.
  • Set up review sessions regularly (ideally weekly) at an expected time.
  • Be receptive to feedback and sidestep your ego’s fight or flight response by evaluating ideas objectively and viewing feedback as a gift.
  • Once you have heard the feedback, repeat what you heard, request confirmation and keep asking ‘Is that all?’ until you are sure the other person is done. Finally, think objectively about the feedback and suggest an action to resolve it.
  • Ensure that feedback is a part of your culture and an expectation from managers. Hold weekly one-on-one meetings, with structured time in the end for mutual feedback and put it all on record.
  • Publicly seek feedback from your team and discuss it. As a leader, publish written feedback from other leaders for the entire company and discuss ways you are trying to improve.

Full post here, 5 mins read

Absolute truths I unlearned as a junior developer

Don’t be led by job titles alone. Value collaborative experience, being reviewed and having a mentor. Avoid an early position where you have to work alone.
Read more

Absolute truths I unlearned as a junior developer

  • The title of ‘senior developer’: Don’t be led by job titles alone. Value collaborative experience, being reviewed and having a mentor. Avoid an early position where you have to work alone.
  • Everyone writes tests: Loads of companies have little or no testing, because they have either never felt the pain of not having tests or felt the pain of having legacy tests.
  • We’re far behind everyone else: Beware of ‘tech FOMO’ as academic settings and conferences often cover proof of concepts rather than real-world scenarios. Dealing with legacy is normal.
  • Code quality matters most: Often, good enough code that works and is maintainable is good enough. Overarching architecture is more important than nitpicking.
  • Technical debt is bad: Disorganized or messy code is not the same as technical debt. Technical debt actually slows you down or causes errors or makes changes difficult. A certain amount of tech debt is healthy because it assures delivery.
  • Seniority means being the best programmer: Senior engineers need many skills, from communication and dependency management to estimation and project management. And the truth is that we all remain junior in some areas.

Full post here, 14 mins read

Growing your tech stack: when to say no

Deployment infrastructure tools (monitoring, logging, provisioning, building executables, deploying) are a moderate risk. They automate tasks for the whole team, which means the whole team (and deployment) stops if it breaks.
Read more

Growing your tech stack: when to say no

  • Local developer utilities (one-person programs, testing tools) are very low-risk. They run on your own machine, though a whole team can adopt them, boosting productivity. If not widely used, switching environments is harder and you might compromise uniformity, but it is often a worthwhile trade-off.
  • Deployment infrastructure tools (monitoring, logging, provisioning, building executables, deploying) are a moderate risk. They automate tasks for the whole team, which means the whole team (and deployment) stops if it breaks. But they reduce risk in production/deployment compared to manual set-up and troubleshooting. They constitute a hot area for development and you risk falling behind your competition without them.
  • A new programming language is also a moderate risk. Each language calls for new build tools, libraries, dependency management, packaging, test frameworks, internal DSLs. More than one person must learn them, or you get code no one but the developer understands. Getting your team on board, fast becomes your responsibility. You can mitigate the risk by carefully isolating the experimental code so that it becomes replaceable. Consider the tooling and documentation available before you select a language and whether other people have integrated it into a stack like yours (and written about it). The more languages you use, the greater the cognitive overheads when debugging.
  • Adding a new database is a serious risk. A stateful system of record is critical to your business: if it goes down, you cannot just rewrite it, business stops until you can do a migration. In the worst-case scenario, you lose data. You can mitigate this risk with replication (backup and restore) and migration automation, which you should integrate and test before data enters the system. You need a dedicated team to maintain the database (monitoring, updating, re-provisioning) as load increases.

Full post here , 13 mins read

How to choose the right database

Relational (SQL-based) databases - such as MySQL, PostgreSQL, Oracle, and MS SQL Server - can be queried with SQL-like languages. They allow indexing and faster updating, but total querying time can be slow for large tables.
Read more

How to choose the right database

  • Relational (SQL-based) databases - such as MySQL, PostgreSQL, Oracle, and MS SQL Server - can be queried with SQL-like languages. They allow indexing and faster updating, but total querying time can be slow for large tables.
  • Their simple structure matches most data types and they support atomic transactions but not OOP-based objects. They require vertical scaling (10-100TB), which requires downtime when adding resources to machines.
  • NoSQL databases typically use JSON records with no common schema. There are four main types of NoSQL databases.
  • Document-oriented databases make each document into a JSON and allow indexing by field (so all records must have that field); they support big data analytics using parallel computations (examples are MongoDB, CouchDB, and DocumentDB).
  • Columnar databases store data by the column (querying by subsets of a column is fast), allowing better data compression. They are commonly used for data modeling and logging (an example is Cassandra).
  • Key-value databases allow only key-based querying, which is fast, and each record has a ‘time to live’ (TTL) field deciding when it gets deleted. They are great for caching, but because of using RAM storage, are expensive (examples include Redis and Memcached).
  • Graph databases use nodes to represent entities and edges to represent their relationship; they are good for knowledge graphs and social networks (examples include Neo4J or InfiniteGraph).

Full post here, 7 mins read

Principles for growth as an engineer

Understand how your work is valuable to your company and make decisions that support quality, feature-richness, and speed. If you find your path blocked, find a way forward by persuasion, escalation or technical creativity.
Read more

Principles for growth as an engineer

  • Understand how your work is valuable to your company and make decisions that support quality, feature-richness, and speed.
  • If you find your path blocked, find a way forward by persuasion, escalation or technical creativity.
  • Think about what needs doing beyond your immediate task. Advocate for what the company or team mission can benefit from.
  • Write crisply and concisely to not just inform, but persuade and teach.
  • Understand dependencies, ensure key components have owners, summarize plans and status, and proactively inform stakeholders of progress (including obstacles).
  • Own your education and pursue constant growth. Make learning a daily task.
  • Master your tools: Mastering the editor, debugger, compiler, IDE, database, network tools, and Unix commands increases your development speed.
  • Communicate proactively, regularly and in an organized way to earn collaborators’ confidence and goodwill.
  • Find opportunities to collaborate, especially on cross-functional projects, to grow in terms of visibility and skills.
  • Be professional and reliable: come to meetings prepared and on time, and pay attention. Deliver what you promise or proactively communicate if things go wrong. Disagree respectfully and show your colleagues appreciation. Avoid complaining and help people stay upbeat.

Full post here, 4 mins read

How to be a good software engineer mentor

As a mid-level or senior developer helping a junior developer grow, show them things they can only get from experience, not just coding.
Read more

How to be a good software engineer mentor

  • As a mid-level or senior developer helping a junior developer grow, show them things they can only get from experience, not just coding.
  • Explain business-side decision making. Have your junior sit in on meetings with clients or production managers (or explain later), so they have context and understand where their tasks come from.
  • Do a demo and discuss the business logic of the project they are working on, so they can see the user perspective and know why certain decisions were made.
  • Cover best practices in code reviews: discuss better solutions if they have poorly written code even if it works. Decode jargon, especially for core concepts, and explain the team’s code decisions so they can see how applications grow.
  • Hear them out before you tell them the ‘right’ way. Resist rushing them towards the end of a sprint and let them learn to think fast and critically and to commit to decisions. Listen and then explain why (if) you need to make a different choice.
  • Talk them through your thought process while coding, then reverse roles: hearing them think lets you notice when they aren’t using a concept correctly and makes them more aware of their own process.

Full post here, 5 mins read

Effective learning for software engineers

Apply these five strategies for effective learning - preparation, exploration, practice, immersion and repetition.
Read more

Effective learning for software engineers

Apply these five strategies for effective learning:

  • Preparation: Select the materials you will learn from, choosing the best available sources per online reviews, experts’ recommendations or personal experience.
  • Exploration: Familiarize yourself with the material as quickly as possible, without going into detail. Feel free to skim through the hardest parts, make quick notes to summarize sections and write down any questions you have.
  • Practice: Apply what you have learned so far to sufficiently challenging exercises (but not too demotivating) and highlight what you do not understand so you can immerse yourself in it later. Continue this longest, most focused effort until the end of the fifth stage.
  • Immersion: Apply the Feynman technique to immerse yourself in all the concepts you found harder - collect all the essential information and break it up into chunks if you need to, write down the concept or visualize it with a drawing if it helps, restate the concept in simpler terms and few sentences as though teaching a child, and add analogies that help connect it with concepts you already understand.
  • Repetition: Transform your notes on the hard concepts to flashcards and review them every day for 10-15 minutes, then make it less frequent (once every few months).

Full post here, 5 mins read

The art of unlearning

The most useful learning is unlearning something false or unhelpful. The first challenge of unlearning is that you are likely to dismiss something that contradicts your current understanding.
Read more

The art of unlearning

  • The most useful learning is unlearning something false or unhelpful.
  • The first challenge of unlearning is that you are likely to dismiss something that contradicts your current understanding (confirmation bias).
  • Unlearning is a deep dive into strangeness underlying what we think we know, acknowledging that convenient approximations guide our actual lives but the accurate picture is stranger and more interesting.
  • To unlearn things, seek additive information in familiar areas and use it to modify old knowledge - this is difficult and requires you to have patience with theoretical and academic learning.
  • You should seek other people’s experiences, and use them as a touchstone to understand your own patterns of thinking (travel is a good example if it involves you actually talking to local people, not just sightseeing).
  • Be bold and varied in experimenting - this is randomness, but avoid obvious risks. However, being able to do this typically requires you to have had positive experiences with venturing outside your comfort zone in the past.
  • Become comfortable with mystery and encourage yourself in an open-ended inquiry. You can thus condition yourself to be comfortable with what starts as aversive (like more activities involving heights to rid yourself of a fear of heights), what psychologists call progressive exposure.

Full post here, 15 mins read

Peacetime productivity, wartime productivity

In business terms, in ‘peacetime’, your company has a large advantage over the competition in its core market and its market share is growing. You focus on expanding your market and reinforcing strengths.
Read more

Peacetime productivity, wartime productivity

  • In business terms, in ‘peacetime’, your company has a large advantage over the competition in its core market and its market share is growing. You focus on expanding your market and reinforcing strengths.
  • In ‘wartime’, your company is fending off an existential threat: competition, dramatic macroeconomic change, market changes, supply chain issues, etc.
  • This distinction impacts your productivity (at an individual level) and strategy (at a team level).
  • In wartime, you may have a website down, a product malfunctioning, a furious customer or a new piece of information that requires overhauling your strategy. You either know what to do but lack time/resources, or you don’t know what to do, and are making important decisions under time pressure (hours or minutes) while being constantly ‘available’.
  • Some strategies for wartime:
  1. Star emails that need urgent responses and get to them asap.
  2. Use an incident command protocol or similar best practices.
  3. If you can help in a specific way/area, pitch in. If you don’t know where to start, start anywhere.
  4. Ignore your usual task manager app and start an essentials-only task list.
  5. Skip weekly reviews or call for an abbreviated version that is fast and focused.
  6. Leave complicated work till peacetime. Focus on controlling the chaos.

Full post here, 6 mins read

How to discover your unknown knowns

Unknown knowns are the things you don’t know you know - matters of instinct, intuition or other factors you consider too trivial to notice.
Read more

How to discover your unknown knowns

  • Unknown knowns are the things you don’t know you know - matters of instinct, intuition or other factors you consider too trivial to notice.
  • Ignoring unknown knowns can lead to dissonance: imposter syndrome, comprehension gaps when you use jargon or fail to provide context, and underestimating your growth curve because you ignore soft skills or trickier, smaller ‘hard skills’.
  • To explore your unknown knowns, expand the implicit into explicit awareness.
  • Write more about the decisions you make till you can see your seemingly simple ideas contain something more profound.
  • Pursue public speaking and share what you have learned. Don’t assume what you say is too brief or basic, it needn’t even be a big event - it could be just your team and a few minutes on a ‘trivial’ topic.
  • Mentor others, show them how you analyze situations or think around an obstacle, and you will find proof that you don’t give yourself enough credit.
  • Track what you learn every week or month in writing, even if it initially appears trivial. In fact, pay attention to smaller accomplishments (a new shortcut, or a strategy to write better emails).

Full post here, 12 mins read

Reasons why you should be coding with interfaces

Interfaces enable you to code against pure abstractions, not implementations. Pure abstraction is the best thing for loose code coupling.
Read more

Reasons why you should be coding with interfaces

  • Interfaces enable you to code against pure abstractions, not implementations. Pure abstraction is the best thing for loose code coupling.
  • Interfaces let you make the coupling between classes also very loose, which lets you alter your implementations at runtime more easily, in turn allowing you to create very pluggable implementations for when you think of a better way to do something without having to change your code radically.
  • Interfaces allow for good inter-module communications, especially useful when you have different teams working on different modules - this way, you don’t need to know how the other team’s code works to interface with it.
  • Interfaces make your code more testable because you can readily substitute implementations, even a fake one. This lets you test using, for example, a fake database returning canned data instead of actually having to connect to and use an active, vulnerable database.
  • Interfaces are useful for implementing design patterns such as Model-View-Controller (MVC) and Model-View-ViewModel, and for dependency injection frameworks.

Full post here, 6 mins read

Avoid rewriting a legacy system from scratch, by strangling it

There comes a point when there is so much technical debt in your legacy project that you can no longer implement new features. Yet rewriting from scratch is risky and refactoring is expensive.
Read more

Avoid rewriting a legacy system from scratch, by strangling it

  • There comes a point when there is so much technical debt in your legacy project that you can no longer implement new features. Yet rewriting from scratch is risky and refactoring is expensive.
  • ‘Strangle’ the codebase instead, progressively deleting the old codebase in favor of building a new one. It has less risk of breaking things and is less work overall.
  • To strangle the codebase:
  1. Write new code that acts as a proxy for the old code: users come to the new system, which actually redirects to the old one behind the scenes.
  2. Build new modules to re-implement each of the legacy behaviors in the new codebase, such that there is no change from the user perspective.
  3. Progressively fade away the old code from use until users are entirely consuming the new modules.
  4. Delete the old, unused code.
  • Use techniques such as wrap classes to add new behaviours to the system without changing existing code at first, which also separates new and old responsibilities while integrating them. Or you can use the domain-driven design (DDD), with the legacy system as a black box and building a bubble context where you apply DDD, interacting with the black box through an anti-corruption layer. Roll out the rewrites progressively.

Full post here, 5 mins read

How to test serverless apps

Most of what goes wrong in serverless architecture lie in the configuration of functions: event sources, timeouts, memory, IAM permissions, etc. With functions being stateless, the number of integration points also increases, so you need more integration tests than unit or end-to-end testing.
Read more

How to test serverless apps

  • Most of what goes wrong in serverless architecture lie in the configuration of functions: event sources, timeouts, memory, IAM permissions, etc. With functions being stateless, the number of integration points also increases, so you need more integration tests than unit or end-to-end testing.
  • The first stage of testing should be local tests, for which you can: run the Node.js function inside a wrapper. Invoke functions locally using tools such as Serverless framework or AWS SAM local. Use docker-lambda to simulate an AWS Lambda environment locally. Use local-stack to simulate AWS services locally. However, none of these simulate IAM permissions or API authentication.
  • The second stage is unit tests. If you have a complex piece of business logic, you should encapsulate it into a module and test it as a unit.
  • Use integration testing to test code against external services you depend on, such as DynamoDB or S3. Run these tests against real DynamoDB tables or S3 buckets, and not mocks and stubs. Keep the same assumptions as of the code.
  • Once the local tests have checked your code, move to acceptance testing: whether functions have the right permissions, timeout settings, memory, API Gateway event sourcing, etc. Do this after deploying.
  • Finally, if your serverless application is used by a UI client directly or indirectly, make sure your changes are compatible with the client - you can have a QA team test this manually or use an automated test framework
  • Once deployed, you should still use robust monitoring and error reporting tools for issues developing in production.

Full post here, 6 mins read

What is legacy code? Is it code without tests?

The definition of legacy code as code without tests comes from Michael Feathers’ Working Effectively with Legacy Code. An alternative definition is valuable code that is difficult to change (you are afraid you might break existing behavior) because you struggle to understand it.
Read more

What is legacy code? Is it code without tests?

  • The definition of legacy code as code without tests comes from Michael Feathers’ Working Effectively with Legacy Code.
  • An alternative definition is valuable code that is difficult to change (you are afraid you might break existing behavior) because you struggle to understand it.
  • You overestimate the complexity of unfamiliar code or code you don’t remember why you wrote. Sometimes it gets better after working with it a few months.
  • Having tests is not enough, good tests make you confident about changing unfamiliar code but poor tests don’t.
  • A recipe for legacy code is multiple people working on a codebase, over a long period, with conflicting requirements, under time pressure. This happens because knowledge is always imperfect and you take shortcuts to meet deadlines until every move introduces a bug and new features take ages to implement.
  • Finally, ‘legacy code’ is a matter of personal perspective: it depends on your understanding of the code and your feelings about changing it.

Full post here, 4 mins read

Production-oriented development

Let the engineers who write the code also be responsible for operating it in production: deploying, instrumenting and on-call for monitoring. If you can avoid building something new, do so. Writing code is the most expensive way to solve any problem that doesn’t address a core business need
Read more

Production-oriented development

  • Let the engineers who write the code also be responsible for operating it in production: deploying, instrumenting and on-call for monitoring.
  • If you can avoid building something new, do so. Writing code is the most expensive way to solve any problem that doesn’t address a core business need, especially when there are open-source and hosted solutions for small/medium companies that deal with git repository hosting, observability tooling, managed databases, alerting etc., and even infrastructure for Kubernetes clusters and load balancers.
  • Make deployment frequent and unexciting: engineers should be able to deploy with minimal manual steps and you should minimize code freezes or blackout periods like ‘don’t deploy on Fridays’.
  • Switch from manual QA gates to automated testing for the deployment pipeline. Have a team for continuous testing in production, and do away with pre-production environments.
  • Choose ‘boring’ technology over bleeding edge tech. It is least likely to be unpredictable and is backed by the most expertise when you do have to debug, unlike ‘unique’ systems.

Full post here, 11 mins read

Data-oriented architecture

In data-oriented architecture (DOA), systems are organized around small, loosely coupled components that are always stateless and component-to-component interaction is minimized so that they interact through a data layer instead.
Read more

Data-oriented architecture

  • In data-oriented architecture (DOA), systems are still organized around small, loosely coupled components as in services-oriented architecture with microservices. However, the components are always stateless and component-to-component interaction is minimized so that they interact through a data layer instead.
  • Common examples that approximate DOA include those using data monoliths where all or most data is persisted in a single store, such as knowledge graphs or GraphQL.
  • If you are worrying about scaling your architecture horizontally, you can consider a data monolith at its center and base your decision on these factors:
  1. Service-to-service APIs are not necessarily ad hoc, but it is easier to pass parameters in direct component-to-component communication.
  2. Beware if your system has integration as a bottleneck, as it will affect growth.
  3. Think hard about data ownership: multiple writers might have to modify the same record so you need to partition write ownership.
  4. Carefully plan the shared global schema because inter-component APIs are encoded in the data.
  5. If services call others with a direct address (like an IP) and know where to reach a particular service from command-line parameters, you need to wrap those better to construct the right flags, depending on the environment (say, structuring each service under a path routed by one server).

Full post here, 10 mins read

Simple systems have less downtime

Simplicity while building a system leads to less downtime because you don’t need to wait for a specifically proficient person to do/help with anything, anybody in the team can take over troubleshooting without a huge learning curve or training.
Read more

Simple systems have less downtime

  • Simplicity while building a system leads to less downtime because you don’t need to wait for a specifically proficient person to do/help with anything, anybody in the team can take over troubleshooting without a huge learning curve or training.
  • Troubleshooting, therefore, takes less time, because learning the system and then identifying and resolving the problem is almost intuitive.
  • When each part of the system has a clear function, it is easier for you to find several alternative solutions.
  • Follow these principles to build simpler systems:
  1. Features don’t justify the complexity. Choose tools that are easy to operate rather than the most feature-rich option.
  2. Complex ideas lead to complex implementations. Pare down your ideas so they can be explained fast.
  3. Try modifications before additions. Most people rush to add new layers, steps or integrations for new requirements. Instead, first, check whether the core system can be modified.

Full post here, 6 mins read

Avoiding vulnerabilities in software development

Impose proper input validation. Beware of information exposure. Ensure proper authentication to assign privileges.
Read more

Avoiding vulnerabilities in software development

Impose proper input validation:
1. Apply the zero trust principle and assume all input is unsafe until proven otherwise. Whitelist validated environmental variables, queries, files, databases and API calls.
2. Realize that attackers may be able to access hidden form fields.
3. Validate input for content, as well as length. Evaluate type, syntax, and conformance to logic (semantic sense).
4. Perform both client-side and server-side checks.
5. Validate inputs again after any data combination or conversion.

Beware of information exposure:
1. Frame your error messages so that they do not give away the full path of a file or program, or expose a user in the database.
2. Contain sensitive information to areas with explicit trust boundaries. Use access controls to secure and restrict connections between ‘safe’ areas and endpoints.
3. Restrict sensitive information from URLs or communication headers. Obscure path names and API keys.  

Ensure proper authentication to assign privileges:
1. Make sure temporary privilege escalations are easily reversed, and soon.
2. Assign privileges through whitelisting, starting with a universal base of least privilege, rather than restricting them through blacklisting.
3. Never allow a lower privilege level to affect a higher privileged user.
4. Restrict log-in attempts and impose session limits.
5. Separate higher-level privileges into different roles to limit ‘power users’.
6. Apply multi-factor authentication.

Full post here, 6 mins read

AWS Lambda - how best to manage shared code

To share code between functions across service boundaries in general, you can use shared libraries or encapsulate the business logic into a service.
Read more

AWS Lambda - how best to manage shared code

For functions that are highly cohesive, organized into the same repository, share code via a module inside the repository. To share code between functions across service boundaries in general, you can use shared libraries (perhaps published as private NPM packages only available to your team) or encapsulate the business logic into a service. To choose, consider:

  • Visibility: Dependency is explicitly declared in a library but often not declared in a service, so you need logging or explicit tracing.
  • Deployment: With a shared library, you rely on consumers to update when you publish a new version. With a service, you decide when to deploy and can control deployment better.
  • Versioning: There will be times when multiple versions of the library are active. With services, you control when and how to run multiple versions.
  • Backward compatibility: with a shared library, you communicate compatibility with semantic versioning (a major update signals a breaking change). With a service, it’s your choice.
  • Isolation: you expose more of the internal workings with a shared library. With a service, you exercise more control.
  • Failure: When a library fails, you know your code has failed and stack traces show what’s wrong. With a service, it may be an actual failure or a timeout (the consumer cannot distinguish between the service being down and being slow), which can be a problem if the action is not idempotent, and partial failures require elaborate rollbacks.
  • Latency: You get significantly higher network latency with a service.

Full post here, 9 mins read

How can we apply the principles of chaos engineering to AWS Lambda

Identify weaknesses before they manifest in system-wide aberrant behaviours: improper fallback settings when a service is unavailable, retry storms from poorly tuned timeouts, outages when a downstream dependency gets too much traffic, cascading failures, etc.
Read more

How can we apply the principles of chaos engineering to AWS Lambda

  • Identify weaknesses before they manifest in system-wide aberrant behaviours: improper fallback settings when a service is unavailable, retry storms from poorly tuned timeouts, outages when a downstream dependency gets too much traffic, cascading failures, etc.
  • Lambda functions have specific vulnerabilities. There are many more functions than services, and you need to harden boundaries around every function and not just the services. There are more intermediary services with their own failure modes (Kinesis, SNS, API Gateway) and more configurations to get right (timeout, IAM permissions).
  1. Apply stricter timeout settings for intermediate services than those at the edge.
  2. Check for missing error handling that allows exceptions from downstream services to escape.
  3. Check for missing fallbacks when a downstream service is unavailable or experiences an outage.
  • Monitor metrics carefully, especially client-side, which shows how user experience is affected.
  • Design controlled experiments to probe the limits of your system.

Full post here, 4 mins read

How to build a remote team that will last

Use chat apps as a sort of virtual water cooler and allow off-topic, informal conversations. For specifically work-related conversations, use a collaboration suite. Add video check-ins through Skype and Zoom, use these for weekly/bi-weekly meetings as well.
Read more

How to build a remote team that will last

Some tips to do remote working right:

  • Define and refine your company’s culture and put the company values and culture docs in a virtual handbook for new employees (and old) to access easily at all times.
  • Introduce remote workers to all their colleagues and ensure they have access to all the same tools and resources as they would have when in office.
  • Communicate using a variety of tools, apps, and media. Use chat apps as a sort of virtual water cooler and allow off-topic, informal conversations. For specifically work-related conversations, use a collaboration suite. Add video check-ins through Skype and Zoom, use these for weekly/bi-weekly meetings as well.
  • Remote workers are mostly invisible to you except telecommunication, so regularly measure their engagement and happiness.
  • Build healthy, rewarding habits for your whole team as part of your company’s culture and values: encourage free interaction with and by new members. Send positive messages through shoutouts for appreciation and send direct communication along the most productive channel, send the non-urgent messages through Slack.
  • Encourage remote workers to switch off communication channels for blocks of time for focused work and also to take regular breaks.

Full post here, 5 mins read

Maximize your team: How I created an engineering roadmap

Draw up a roadmap for the team: Identify a long-term focus. Evaluate previous efforts. Allow visibility into what the team is focusing on. Ensure planning efforts and workloads are easy to timebox and monitor so the team knows when they are ahead or behind.
Read more

Maximize your team: How I created an engineering roadmap

  • It takes considerable effort from you as a leader for your team to be successful. Draw up a roadmap for the team: Identify a long-term focus. Evaluate previous efforts. Allow visibility into what the team is focusing on. Ensure planning efforts and workloads are easy to timebox and monitor so the team knows when they are ahead or behind. Collaborate with other teams and use your business needs to prioritize value over cool factor.
  • Set the roadmap at a meeting and brainstorm together to visualize the destination you want to arrive at, and when you want to get there. Use a value system to filter or prioritize the ideas generated - nice to have, important but not urgent, critical for efficiency, debugging needed right away, important and urgent.
  • Create a roadmap document:

a) Summarize team and business goals.

b) List responsibilities and desired outcomes for the team. Include a list of all the things the team is managing and what is likely to be phased out or removed to help prioritize.

c) Review achievements for the previous year both for tracking goals and for motivation.

d) List this year’s goals, with 4-5 overarching objectives with 3-5 subheads each. Organize each section in a similar way to the main roadmap document.

  • Avoid cluttering the document with too much detail of plans and solutions to issues targeted - that’s a different exercise, which you should restrict to the relevant team and not the entire larger team.

Full post here, 6 mins read

Distributing operational knowledge across a team

Start building your knowledge repositories. These are for information needed in the long term by many different team members.
Read more

Distributing operational knowledge across a team

  • Knowledge management isn’t just a concern for larger companies. Small teams can & should adopt knowledge management practices and tools right from the beginning. It helps in building a scalable team and an effective on-boarding of new members to the team.
  • Start building your knowledge repositories. These are for information needed in the long term by many different team members.
  • To begin with, use simple tools for task management and collaboration and you can always move to a more feature-rich tool if and when you need to. To make sure people actually use a tool, make the tools usable and relevant and offer any necessary training needed for your team to get comfortable with the tool. Define expectations around particular tools for particular tasks, so that work or concerns are not addressed unless logged with that tool.
  • Find a comfortable signal-to-noise ratio of communication: start with more communication and gradually filter out what you realize you don’t need, rather than missing out what you didn’t know you need to know.
  • Remember the importance of face-to-face contact as well, in person or via VoIP, for clarifications, resolving long-standing issues and building morale. For teams across time zones, make these meetings asynchronous or organize several meetings at different times to allow a fair spread for participation.

Full post here, 9 mins read

Things to remember before you say yes to automation testing

All tests can’t be and shouldn’t be automated. Know which tests, if automated, will stop finding bugs and keep them out of the automation list.
Read more

Things to remember before you say yes to automation testing

  • All tests can’t be and shouldn’t be automated. Know which tests, if automated, will stop finding bugs and keep them out of the automation list.
  • Work on well thought out and well-defined test cases before starting to build test automation.
  • Use the programming language your testers are familiar with to keep the learning curve not too steep.
  • If test automation gives ambiguous results, don’t consider it. There must be a problem with the test script or test plan. Look at solving that problem.
  • Break your test suite into smaller chunks of independent test cases that don’t affect the results of other test cases.

Full post here, 6 mins read

How to avoid cascading failures in distributed systems

Cascading failures in distributed systems typically involve a feedback loop where an event causes a reduction in capacity, an increase in latency, or a spike in errors which then becomes a vicious cycle due to the responses of other parts of the system
Read more

How to avoid cascading failures in distributed systems

  • Cascading failures in distributed systems typically involve a feedback loop where an event causes a reduction in capacity, an increase in latency, or a spike in errors which then becomes a vicious cycle due to the responses of other parts of the system. You need to design your system thoughtfully to avoid them.
  • Set a limit on incoming requests for each instance of your service, along with load shedding at the load balancer, so that the client receives a fast failure and retry, or an error message early on.
  • Moderate client requests to limit dangerous retry behaviours: impose an exponentially increasing backoff between retries and add a little jitter, making the number of retries and wait times application-specific. User-facing applications should degrade or fail fast, batch or asynchronous processing can take longer. Also, use a circuit breaker design to track failures and successes so that a sequence of failed calls to an external service trips the breaker.
  • Ensure bad input does not become a query of death, crashing the service: write your program to quit only if the internal state seems incorrect. Use fuzz testing to help detect programs that crash from malformed input.
  • Avoid making failover plans based on proximity where a failure of a data center or zone pushes the load into the next closest resource, which will then likely cause a domino effect since this second one is likely to be as busy. Balance the load geographically instead, pushing the load to data centers with the most available capability.
  • Reduce, limit or delay work that your server does in response to a failure, such as data replication, with a token bucket algorithm and wait a while to see if the system can recover.
  • Reduce startup times from reading or caching a lot of data, to begin with; it makes autoscaling difficult and you may not detect the problem by the time you start up, and recovery will equally take longer if you need to restart.

Full post here, 14 mins read

The API security maturity model

The API Security Maturity Model is a corollary to the Richardson Maturity Model associated with RESTful API design, describing four levels of REST compliance. It describes cumulative levels of security, complexity, and efficiency.
Read more

The API security maturity model

  • The API Security Maturity Model is a corollary to the Richardson Maturity Model associated with RESTful API design, describing four levels of REST compliance. It describes cumulative levels of security, complexity, and efficiency.
  • Level 0 uses API keys and basic authentication, which is fundamentally insecure as it assumes whoever has the key is the rightful owner of it. There is basically no separate authorization process.
  • Level 1 uses token-based authentication but still conflates authentication and authorization, or produces quasi-authentication where the token acts as an ID card but is vulnerable to malicious intent as you assume the possession of the token is itself guarantee against mal-intent.
  • Level 2 uses token-based authorization, where authentication tokens allow entry but access and privileges are regulated by a system such as OAuth, with permissions designed to match a token’s lifespan and purpose or be set so that tokens age out of use; however, these systems are designed to be authoritative so you need to ask whether you can trust the system the token comes from, and also consider the reliability of data in transit, as tokens can collect more data and alter it as they pass through the system, so you need to monitor who adds data and what sort.
  • Level 3 uses claims for a centralized trust system, which gathers context and verifies information about the subject rather than simply trusting the caller, API gateway or token issuer; to achieve this, you need an asserting party you trust to verify the context and subject attributes for each claim with signed tokens (using private and public keys).

Full post here, 10 mins read

Tips for running scalable workloads on Kubernetes

You must set resource requests & limits so the Kubernetes scheduler can ensure workloads are spread across nodes evenly.
Read more

Tips for running scalable workloads on Kubernetes

  • You must set resource requests & limits so the Kubernetes scheduler can ensure workloads are spread across nodes evenly.
  • The scheduler can use configured affinities & anti-affinities as another hint of which node is best to assign your pod to.
  • In Kubernetes, readinessProbe tells that a pod is ready to start receiving requests and livenessProbe tells that a pod is running as expected. Setting these ensure that requests to service always go to a container that can process the request.
  • It is common for nodes in Kubernetes to disappear and you should configure a pod disruption budget to ensure you always have a minimum number of ready pods for deployment.

Full post here, 13 mins read

Non-functional requirements: Quality

To keep clean code, always activate the compiler warning, no matter what language you use; define (and monitor) naming, line length, and API documentation conventions like Javadoc;
Read more

Non-functional requirements: Quality

  • To develop quality code, you should focus on clean code structure, the correctness of implementation and maintainability.
  • Define a company-wide META architecture for every project to follow, allowing more reuse of code. Use the KISS principle and avoid excessive detailing or heavy customizing.
  • To keep clean code, always activate the compiler warning, no matter what language you use; define (and monitor) naming, line length, and API documentation conventions like Javadoc; and maintain test coverage of least 85% with not only unit tests but also automated acceptance tests, and regular manual code inspections, especially when working with external vendors.
  • Rather than using pull requests as a quality gate, use a simpler model such as release branch lines and automate visualization of coupling between classes and modules. Light coupling is easier to maintain and lets you reutilize the module/class more readily.
  • Avoid third-party libraries, and try to solve any given problem with only one library/technology to reduce security risks.

Full post here, 10 mins read

The paradox of scale

Gall’s Law: A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.
Read more

The paradox of scale

Gall’s Law: A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.

  • What works for small simple systems is significantly different for what works for large complicated systems.
  • Things that seem crucial for building high-quality production systems, like the ones large tech companies use, are mostly not required when you are starting out building new products.
  • To build large systems, you need to start with small steps to build small systems. ‘Think big, start small, find a foothold and start journeying’ is how any large system starts getting built.
  • Looking at what the giants do and trying to copy it is only a good way to fail. Instead, learn lessons from how they started, what they did along their journeys (and why they did it)

Full post here, 4 mins read

The cracking monolith: the forces that call for microservices

Overweight monoliths exhibit degrading system performance and stability, or slow development cycles, or both.
Read more

The cracking monolith: the forces that call for microservices

  • Overweight monoliths exhibit degrading system performance and stability, or slow development cycles, or both.
  • Single points of failure are typical of large monolithic apps and when they come under pressure, your team spends too much time solving technical issues instead of on development. For example - outages in non-critical data processing that bring down the whole system. Or all time-intensive tasks grouped into the background and becoming so unstable as to need a dedicated team. Or changing one part of the system affects others that should logically be unrelated.
  • If shipping a hotfix takes weeks or months, you will have a problem with slow development. To know when it is a good time to break the monolith, you should watch out for:
  • CI builds that take longer than 10 minutes (though a good CI tool should help it by splitting or auto-sequencing tasks).
  • Slow deployment, due to many dependencies and multiple app instances, especially when containerized.
  • Slow onboarding of new staff as it may take them months to get comfortable enough to make a non-trivial change to the codebase. Veteran team members becoming a bottleneck as reviewers because too many developers are waiting for their inputs.
  • New use cases and problems are not easily addressed with existing tools, and software updates are being put off, indicating you are dependent on outdated technology.

Full post here, 6 mins read

Severe truth about serverless security and ways to mitigate major risks

Cloud providers may secure your databases, operating systems, virtual machines, the network, and other cloud components, but you must still protect your application layer against cyber attacks.
Read more

Severe truth about serverless security and ways to mitigate major risks

  • Cloud providers may secure your databases, operating systems, virtual machines, the network, and other cloud components, but you must still protect your application layer (code, business logic, data and cloud service configurations) against cyber attacks.
  • Traditional web application firewalls only protect functions called through an API gateway. So, apply perimeter security to each function, incorporate whitelist validation, monitor updates to functions, and add runtime defense solutions.
  • Be wary of third-party dependencies. Derive components from reliable official sources via secure links. For Node.js applications, use package locks or NPM shrinkwrap to restrict updates to code until you review them. Identify and fix vulnerabilities with automated dependency scanners.
  • Ensure all credentials that invoke third-party services or cross-account integrations are temporary or encrypted and use a cryptographic key management solution. Set strict constraints on input/output messages passing through the API gateway.
  • Address the downside of autoscaling, DoW (denial of wallet) attacks: set budget limits with alarms, limit the number of API requests in a given time window, use DDOS protection tools, and try to make API gateways internal and private.

Full post here, 7 mins read

“Let’s use Kubernetes!” Now you have problems

If you need to scale, you need at least 3-4 virtual machines, and that means twice as many actual machines at a minimum.
Read more

“Let’s use Kubernetes!” Now you have problems

  • If yours is a small team, Kubernetes may bring a lot of pain and not enough benefits for you.
  • If you need to scale, you need at least 3-4 virtual machines, and that means twice as many actual machines at a minimum.
  • The codebase is heavy: 580,000 lines of Go code at its heart as of March 2020, and large sections have minimal documentation and lots of dependencies.
  • Setting up and deploying Kubernetes is complex - architecturally, operationally, conceptually and in terms of configurations, compounded by confusing default settings, missing operational controls and implicitly defined security parameters.
  • Your application becomes hard to run locally because you need VMs or nested Docker containers, to begin with, staging environments, proxying a local process into a cluster or a remote process on to a local machine, etc.
  • You are tempted to write lots of microservices but distributed applications are hard to write correctly and hard to debug. If you have more services written than the number of developers on each, you are doing it wrong.

Full post here, 6 mins read

Serverless pitfalls: issues with running a startup on AWS Lambda

Hosting your backend behind an API gateway can result in latency issues. If you want<50ms response times, you need dedicated infrastructure.
Read more

Serverless pitfalls: issues with running a startup on AWS Lambda

  • Functions with less RAM have slower CPU speed. Take both CPU and RAM into account (as well as the related costs: you save no money once execution time drops below 100ms) when allocating resources to Lambda functions.
  • Hosting your backend behind an API gateway can result in latency issues. If you want <50ms response times, you need dedicated infrastructure.
  • Lambdas in a VPC cannot connect to outside services like S3 unless you install a NAT gateway, with the associated charges. It’s best to run either completely inside or outside a VPC, even if it means less security and more latency and bandwidth costs.
  • Due to their distributed queues, AWS can execute Lambdas more than once per request. So be aware of and work around their idempotency.
  • You’ll find it hard to identify and debug functions that hang or get deadlocked because they automatically disappear over the limit. You’ll need to look to Cloudwatch for them.
  • Executing a Lambda from another Lambda is slow. Either launch a task specifically to launch a large number of other tasks or use threads to launch multiple tasks simultaneously.
  • To work around the dreaded cold start, you can move front-end requests facing the end user off Lambda and try to containerize.

Full post here, 10 mins read

Break a monolith to microservices - best practices and design principles

Figure out how to segregate the data storage according to the constituent microservices, using a CQRS (command and query responsibility segregation) architecture so that data is not shared between microservices and is accessed only via APIs.
Read more

Break a monolith to microservices - best practices and design principles

  • Figure out how to segregate the data storage according to the constituent microservices, using a CQRS (command and query responsibility segregation) architecture so that data is not shared between microservices and is accessed only via APIs.
  • Break down the migration into steps, applying domain-driven design, rather than overhauling all repositories, deployment, monitoring, and other complex tasks at once. First, build new capabilities as microservices, then break down the monolith, starting with transforming any known pain points and troublesome gaps.
  • Allocate dedicated teams to every microservice to scale linearly and efficiently, as each team will be familiar with the nuances of its own service. Recognize this is as much a cultural shift as an operational one.
  • Pair the right technology with the right microservice for maintainability, fault tolerance, scalability, economy, and ease of deployment, and choose languages based on the team’s existing skillset.
  • Use ‘build and release’ automation to independently deploy each microservice.
  • Use a REST API so you need not install additional software or libraries and can handle multiple types of calls and data formats.
  • Isolate runtime processes with distributed computing - containerization, event architectures, HTTP management approaches, service meshes, circuit breakers, etc.
  • Distinguish between dedicated and on-demand resources, moving between them to reduce response time and deliver a superior customer experience; also reduce dependency on open-source tools.

Full post here, 8 mins read

Eventual vs strong consistency in distributed databases

Ensure you replicate data for storage - in the case of databases, redundancy introduces reliability.
Read more

Eventual vs strong consistency in distributed databases

  • Ensure you replicate data for storage - in the case of databases, redundancy introduces reliability.
  • For consistency across multiple database replicas, a write request to any node should trigger write requests for all replicas.
  • In an ‘eventual consistency’ model, you can achieve low latency for read requests by delaying the updates to replicas, but you will risk returning stale data to read requests from some nodes if the update has not reached them yet.
  • With a ‘strong consistency’ model, write requests to replicas will be triggered immediately. However, they will delay subsequent read/write requests to any of the databases until the consistency is reached.

Full post here, 4 mins read

Modern data practice and the SQL tradition

Create more features at the query level to gain flexibility with different feature vectors, so that model selection and evaluation are quicker.
Read more

Modern data practice and the SQL tradition

  • Beware the schemaless nature of NoSQL systems, which can easily lead to sloppy data modeling at the outset. Start with an RDBMS in the first place, preferably with a JSON data type and indices on expressions, so you can have a single database for both structured and unstructured data and maintain ACID compliance.
  • Bring ETL closer to the data and be wary of decentralized data cleaning transformation. Push data cleaning to the database level wherever possible - use type definitions, set a timestamp with timezone policy to enable ‘fail fast, fairly early’, use modern data types such as date algebra or geo algebra instead of leaving that for Pandas and Lambda functions, employ triggers and stored procedures.
  • Create more features at the query level to gain flexibility with different feature vectors, so that model selection and evaluation are quicker.
  • Distributed systems like MongoDB and ElasticSearch can be money-hungry (both in terms of technology and human resources), and deployment is harder to get right with NoSQL databases. Relational databases are cheaper, especially for transactional and read-heavy data, more stable and perform better out of the box.
  • Be very meticulous as debugging is quite difficult for SQL, given its declarative nature. Also, be mindful of clean code and maintainability.

Full post here, 13 mins read

Working as a software developer

Modularize the software into subsystems, layers or modules based on small chunks of functionality. Develop in small iterations and apply repeatable unit tests to ensure they work as expected and stay decoupled.
Read more

Working as a software developer

  • Realize that software is never done because customers find more uses for it and request more features. So, the code keeps getting bigger and more complex. And also it is never done by one developer alone. The aggregation results in complexity and in turn bugs, so plan for failures and build in issue tracking, logging and error handling.
  • Develop the skill of reading code to understand what it does and how. Write code that is easier to read and hence to modify.
  • Modularize the software into subsystems, layers or modules based on small chunks of functionality. Develop in small iterations and apply repeatable unit tests to ensure they work as expected and stay decoupled.
  • Write for people first and the computer second. It’s better to be clear than to be clever. Impose good version control.

Full post here, 8 mins read

Best practices for user account, authorization, and password management

Impose a cryptographically strong irreversible hash of the password, salting it with a value unique to that specific login credential.
Read more

Best practices for user account, authorization, and password management

  • Impose a cryptographically strong irreversible hash of the password, salting it with a value unique to that specific login credential.
  • Separate user identity from the user account, designing your user management system for low coupling and high cohesion between different parts of a user’s profile. Allow users to change usernames and link multiple identities to a single user account.
  • Keep username rules reasonable, remain case-insensitive and avoid restricting length and character set. Also, allow as long and complex a password as a user wants (your hashing will condense it anyway).
  • Consciously decide on thresholds for session length and re-verify authentication for security in case of certain events like password resets, critical profile changes, logins from new devices or too many devices, or a sensitive action with perhaps financial implications. Offer users the option for increased security when alerting for such events and ensure even unsaved activity prior to authentication are preserved.
  • Build a secure authorization system, with password reset and not retrieval, detailed activity logging, rate-limiting of login attempts, locking out users after several unsuccessful attempts, and 2-factor re-authentication for new devices or long-idle accounts.

Full post here, 9 mins read

Breaking down a monolith into microservices - an integration journey

Avoid too much decoupling as a first step, you can always break it down later on. Enable logging across the board for observation and monitoring.
Read more

Breaking down a monolith into microservices - an integration journey

  • Before transitioning, identify the biggest pain points and boundaries in the monolithic codebase and decouple them into separate services. Rather than the size of code chunks, focus on ensuring these services can handle their business logic within the boundaries.
  • Split developers into two teams: one that continues to work on the old monolith which is still running and even growing and another to work on the new codebase.
  • Avoid too much decoupling as a first step, you can always break it down later on. Enable logging across the board for observation and monitoring.
  • Enforce security between microservices with mutual TLS to restrict access by unauthorized clients even within the architecture, and Oauth2-based security service.
  • For external clients, use an API gateway for authentication and authorization, and firewalls and/or tokens based on the type of client.
  • Secure any middleware you use as most come without credentials or a default credential. Automate security testing in your microservices deployment procedure.

Full post here, 5 mins read

Handling distributed transactions in the microservices world

One solution is a two-phase commit. This method splits transactions into a prepare and a commit phase, with a transaction coordinator to maintain the lifecycle of the transaction
Read more

Handling distributed transactions in the microservices world

  • In the microservices context, a distributed transaction is distributed to multiple services, called in a sequence, to complete the single transaction.
  • The ACID (atomicity, consistency, isolation, durability) test is challenging for distributed transactions with microservices because you need to be able to roll back the entire sequence if a microservice later in the sequence returns a failure, but atomicity implies the transaction should complete in entirety or fail in entirety. Also, when handling concurrent requests, you might have an object from one microservice being persisted into the DB even as the second reads that same object. This challenges both consistency and isolation.
  • One solution is a two-phase commit. This method splits transactions into a prepare and a commit phase, with a transaction coordinator to maintain the lifecycle of the transaction: first, all microservices involved will prepare for a commit and notify the coordinator when ready. Then the coordinator issues either a commit or rollback command to all the microservices.
  • It guarantees atomicity, allows read/write isolation (no changes to objects until the commit), and makes a synchronous call to notify the client of success or failure. However, it is a slow method and also locks database rows which can become a bottleneck and can allow two transactions to reach a deadlock.
  • Another solution is to use asynchronous local transactions for related microservices, which communicate through an event bus, guided by a separate choreographer system that listens for success and failures from the bus and chases a rollback up the sequence with a ‘compensating transaction’. This makes each microservice atomic for its transaction, hence the operation is faster, no database locks are needed and the system is highly scalable. However, it lacks read isolation. With many microservices, this is also harder to debug and maintain.

Full post here, 7 mins read

The 5 levers to address ‘org smells’ and ship higher-quality software (faster)

Clarify roles and responsibilities for both teams and individuals, including where these overlap, to help everyone understand what they should be doing, who to approach with questions on a given area and what is a shared endeavor.
Read more

The 5 levers to address ‘org smells’ and ship higher-quality software (faster)

  • Clarify roles and responsibilities for both teams and individuals, including where these overlap, to help everyone understand what they should be doing, who to approach with questions on a given area and what is a shared endeavor.
  • Create living product documentation to share insights and stay aligned. Regularly update product development processes and key product documents such as strategic objectives and roadmaps.
  • Hold productive and engaging meetings, with separate people facilitating and leading each of them. As a facilitator, share enough context beforehand, lay out an agenda, track time, keep people focused and maintain minutes of the meeting. As a leader, decide what needs to be done synchronously at the meeting and what can be achieved asynchronously. Set expectations for the how, when and where of communication within and across teams.
  • Develop good relationships with your team members so you are able to have difficult conversations with them with ease. Remember people crave BICEPS - belonging, improvement/progress, choice/autonomy, equality/fairness, predictability, and status.
  • Share context and critical milestones of progress through stages widely with the entire organization. Explain the reasoning and history behind decisions made.

Full post here, 14 mins read

3 research-backed principles that help scale your engineering org

Circumvent Brook’s Law (aka the Mythical Man-Month), which says adding manpower to a late project makes it later. If you must add team members to long, large projects, look for people who already have hands-on experience with the codebase
Read more

3 research-backed principles that help scale your engineering org

  • Dunbar’s research says that the most evolved part of the human brain can maintain a maximum social group size of about 150. Heed Dunbar’s number and keep a maximum team size of 150 and this should be entirely a standalone system. Ideally, have 10 people or fewer per team. Add system-level interfaces, roadmaps, and tools like Jira once you exceed 35 members. Institute monthly cross-team demos to share knowledge and make time for relationship building at personal and not just a professional level.
  • Use Conway’s Law to your advantage by strategically building your organizational structure to reflect your desired software architecture, since you will typically find the systems you design anyway mirror your company’s communication structure. This inverse Conway maneuver also means merging teams building similar systems so that duplicate systems converge.
  • Circumvent Brook’s Law (aka the Mythical Man-Month), which says adding manpower to a late project makes it later. If you must add team members to long, large projects, look for people who already have hands-on experience with the codebase or consider shuffling people across teams (in consultation with their managers). Complement this by factoring in time needed to onboard new people and train existing ones when committing to a schedule.

Full post here, 5 mins read

Forget monoliths versus microservices. Cognitive load is what matters

You should aim to minimize intrinsic load (with training, technologies, hiring, pair programming, etc.) and eliminate the extraneous load of boring or superfluous tasks to allow more space for germane cognitive load, or value-added thinking.
Read more

Forget monoliths versus microservices. Cognitive load is what matters

  • As psychologist John Sweller describes it, cognitive load is the total mental effort in working memory. It can be intrinsic (aspects fundamental to the task), extraneous (relating to the environment the task is done in), and germane (aspects needing special attention for learning or higher performance).
  • You should aim to minimize intrinsic load (with training, technologies, hiring, pair programming, etc.) and eliminate the extraneous load of boring or superfluous tasks to allow more space for germane cognitive load, or value-added thinking.
  • Prevent a software system from growing beyond the cognitive load of the team responsible for it. Explicitly define platforms and components to reduce extraneous load too.
  • Create well-defined interaction patterns among the team. Also, minimize inter-team dependencies.
  • Rather than small, cross-functional product or feature teams, build independent stream-aligned teams (based on a line of business, or market segment or specific geography, for example) that can analyze, test, build, release and monitor changes largely without affecting other teams.
  • Build the thinnest viable platform for each team, with the smallest set of APIs, documentation, and tools to accelerate their work.

Full post here, 9 mins read

Adopting microservices at Netflix: lessons for architectural design

Create a separate data store for each microservice and let the responsible team choose the DB that best suits the service.
Read more

Adopting microservices at Netflix: lessons for architectural design

  • Create a separate data store for each microservice and let the responsible team choose the DB that best suits the service. To keep different DBs in sync and consistent, add a master data management tool to find and fix inconsistencies in the background.
  • Use the immutable infrastructure principle to keep all code in a given microservice at a similar level of maturity and stability. So, if you need to add or rewrite code for a service, it is best to create a new microservice, iterate and test it until bug-free and efficient, and then merge back once it is as stable as the original.
  • You want to introduce a new microservice, file or function to be easy, not dangerous. Do a separate build for each microservice, such that it can pull in component files from the repository at the appropriate revision level. This means careful checking before decommissioning old versions in the codebase as different microservices may pull similar files at different revision levels.
  • Treat servers, especially those running customer-facing code, as stateless and interchangeable members of a group for easy scaling. Avoid ‘snowflake’ systems where you depend on individual servers for specialized functions.

Full post here, 7 mins read

Designing resilient systems beyond retries: architecture patterns and engineering chaos

Incorporate idempotency: an idempotent endpoint returns the same result given the same parameters with no side effects or any side effects are only executed once (this makes retries safer).
Read more

Designing resilient systems beyond retries: architecture patterns and engineering chaos

  • Incorporate idempotency: an idempotent endpoint returns the same result given the same parameters with no side effects or any side effects are only executed once (this makes retries safer). If an operation has side effects but cannot distinguish unique calls, add an idempotency key parameter which the client must supply for a safe retry (else retry is prevented).
  • Use asynchronous responses for ‘deferable work’: instead of relying on a successful call to a dependency that might fail, return a successful or partial response to the client from the service itself. This ensures downstream errors don’t affect the endpoint and reduces the risk of latency and resource use, with retries in the background.
  • Apply chaos engineering to test resiliency as a best practice: deliberately introduce latency or simulate outages in parts of the system so it fails and you can improve on it. However, minimize the ‘blast radius’ of chaos experiments in production - in action, it should be the opposite of chaotic:
  1. Define a steady state. Your hypothesis is that the steady state will not change during the experiment.
  2. Pick an experiment that mirrors real-world situations: a server shutting down, a lost network connection to a DB, auto-scaling events, a hardware switch.
  3. Pick a control group (which does not change) and an experiment group from the backend servers.
  4. Introduce a failure in an aspect or component of the system and attempt to disprove the hypothesis by analyzing metrics between control and experiment groups.
  5. If the hypothesis is disproved, the affected parts are in need of improvement. After making changes, repeat your experiment until confidence is achieved.
  6. Automate your chaos experiments, including automatically disabling the experiment if it exceeds the acceptable blast radius.

Full post here, 6 mins read

Continuous testing - creating a testable CI/CD pipeline

For continuous testing, focus on confidence, implementation, maintainability, monitoring and speed.
Read more

Continuous testing - creating a testable CI/CD pipeline

For continuous testing, focus on confidence, implementation, maintainability, monitoring and speed (CIMMS):

  1. For greater confidence, pair testers with developers as they write code to review unit tests for coverage and to add service tests for business logic and error handling.
  2. To implement, use tools that support rapid feedback from fast running of repeatable tests. For service-level tests, inject specific responses/inputs into Docker containers or pass stubbed responses from integration points. For integration tests, run both services in paired Docker containers within the same network. Limit full-environment tests.
  3. Ensure tests are maintained and up to date. Create tests with human-readable logging, meaningful naming and commented descriptions.
  4. To monitor, use testing tools that integrate into CI/CD pipeline tools to make failures/successes visible and even send emails out automatically. In production, labeling logs to trace a user’s path and capturing system details of the user environment allows easier debugging.
  5. For speed, keep the test suite minimal. Let each test focus on only one thing and split tests to run in parallel if need be. Segregate to test only for changed areas and ignore those with no cross-dependencies.
  • Avoid automating everything. Run manual exploratory tests at each stage to understand new behaviours and determine which of those need automated tests.
  • When pushing to a new environment, test environmental rollback. Reversing changes should not impact users or affect data integrity. Test the rollout process itself for production and run smoke tests. Continue to monitor by running known error conditions and ensure monitoring captures those with sufficient information for easy debugging.

Full post here, 7 mins read

Great code reviews - the superpower your team needs

As a reviewer, take co-ownership of the code, frame comments as suggestions rather than instructions, and also offer positive feedback. As an author, assume the best intention rather than taking comments personally.
Read more

Great code reviews - the superpower your team needs

Lessons from Shopify

  • Keep your pull requests small (200-300 lines, or one single concern) so the reviewer won’t procrastinate or be interrupted and can dive deeper, yet respond to you faster. This way, you will also get more independent review units and therefore better quality review, affect fewer people and need to call on fewer domains of expertise, make it easier to identify problems for rollbacks, and separate the easy stuff from the hard.
  • Use work-in-progress (WIP) or draft pull requests (PR) to elicit early feedback that validates your choice of algorithm, design, API, etc., to ensure you are heading in the right direction before you start to build. This way you don’t waste time on documentation and details if you got it wrong.
  • When you are the reviewer, follow the principle of strong opinions loosely held, and focus on the code rather than the person. As a reviewer, take co-ownership of the code, frame comments as suggestions rather than instructions, and also offer positive feedback. As an author, assume the best intention rather than taking comments personally.
  • Pick the right person as reviewer: someone with context on the feature or component you are building, someone with strong skills in the language, framework or tool you use, someone with opinions on the subject, someone who cares about the outcome of your project, someone who needs to learn about this (in case of a junior reviewing a senior’s code).
  • Use a PR template where the description includes why the PR is necessary, who benefits from this, what can go wrong, what other approaches you considered, why you settled on this solution, and what other systems it can affect.

Full post here, 10 mins read

Why do so many developers get DRY wrong?

DRY has come to mean ‘don’t cut and paste’ but the original idea behind ‘don’t repeat yourself’ had to do with knowledge, and not code. The name is itself a leaky abstraction.
Read more

Why do so many developers get DRY wrong?

  • Humans are good at pattern matching and developers do an even more excellent job of it. The more code you read/write, the more patterns you see coming up. And it is tempting to take those patterns, name them properly and keep referring back to them.
  • It feels really good to DRY up the code. It is really like the lowest common form of refactoring. There were days you copy-pasted code because you did not have a hang of subroutines/functions and now that you have the experience and skill to condense this code, you go ahead and do it because it suddenly feels simpler & cleaner.
  • ‘It was repetitive’ can be a trap. DRY is commonly misunderstood and leads to misguided refactoring.
  • DRY has come to mean ‘don’t cut and paste’ but the original idea behind ‘don’t repeat yourself’ had to do with knowledge, and not code. The name is itself a leaky abstraction.

Full post here, 3 mins read

The wall of technical debt

Use a physical wall to visualize your tech debt on sticky notes. It is easy to start and maintain, yet it can usefully impact choices that add, payback or ignore the technical debt.
Read more

The wall of technical debt

A method to make technical debt visible and negotiable:

  • Use a physical wall to visualize your tech debt on sticky notes. It is easy to start and maintain, yet it can usefully impact choices that add, payback or ignore the technical debt. Pick a central location with high visibility. Make the display dramatic.
  • Decide on some sort of tally mark to represent costs (time or money), so it is not a matter of opinion only. Conversely, if it does not have a cost but only looks awkward, don’t log it as debt.
  • Build a working habit and stay honest
  • Keep your notes short but easy to understand - what made code difficult to understand, what slowed it down, why a bug was hard to find, what should have been better documented or tested. Estimate the opportunity cost as well as time to fix the issues. If your team is using an issue tracker, add the ID to the sticky notes.
  • Negotiate tradeoffs based on this visualization. Discuss as a team whenever someone needs to add a note or time notation whether it is faster or cheaper to fix the debt. If it is, fix it right there and then. If not, add it to the wall. Give away control to managers to decide what to focus on.
  • Beware of starting with a complete debt audit - it can become its own bottleneck as it calls for buy-in and tends to get put off indefinitely.

Full post here, 7 mins read

Scaling to 100k users

When you first build an application, API, DB and client may reside on one machine/server. As you scale up, you can split out the DB layer into a managed service.
Read more

Scaling to 100k users

  • When you first build an application, API, DB and client may reside on one machine/server. As you scale up, you can split out the DB layer into a managed service.
  • Consider the client as a separate entity from the API as you grow further and build for multiple platforms: web, mobile web, Android, iOS, desktop apps, third-party services, etc.
  • As you grow to about 1000 users, you might add a load balancer in front of the API to allow for horizontal scaling.
  • As serving and uploading resources start overloading servers, at say 10,000 users, move to a CDN, which you can get with a cloud storage service for static content so the API no longer needs to handle this load.
  • Finally, you might scale out the data layer at 100,000 users, with relational database systems such as PostgreSQL, MySQL, etc.
  • You might also add a cache layer using an in-memory key-value store like Redis or Memcached so that multiple hits to the DB can be served by cached data. Cache services are also easier to scale out than DBs themselves.
  • Finally, you might split out services to scale them independently, with say a load balancer exclusively for the web socket service; or you might need to partition and shard the DB, depending on your service; you might also want to install monitoring services.

Full post here, 8 mins read

Evolving systems for new products

A common anti-pattern to avoid is preemptively optimizing systems for the future while still trying to establish the current market fit. It leads to slower iterations between product experiments.
Read more

Evolving systems for new products

  • A common anti-pattern to avoid is preemptively optimizing systems for the future while still trying to establish the current market fit. It leads to slower iterations between product experiments.
  • You can try to expedite the development by reusing existing systems and tooling.
  • Evaluate your business logic performed at read time to identify what data was shared with an application to enable better data modeling and quick, small product changes.
  • Be alert to scaling challenges and set new goals accordingly. For example, you might perform nightly load tests to catch issues and decide to reduce the complexity of a system to quickly develop on the backend in response to new feature requests.
  • For a design that lasts in the future, you might relate data in what used to be disparate stores so that a single request suffices instead of a string of orchestrated calls by a read service. You might need to modify your ETL pipeline to consolidate data (which can be complex and risky) for sharing and passing downstream. And while you migrate, you may want to use a dual write pattern to old and new databases, but it will reduce dependencies and make it easier to triage issues.

Full post here, 8 mins read

Improving resiliency and stability of a large-scale monolithic API service

The results of microclustering include the ability to limit downstream failures and bugs to a single vertical, and each cluster can be tuned independently of the others for better capacity planning, monitoring and granular control over deployment.
Read more

Improving resiliency and stability of a large-scale monolithic API service

Lessons from the API layer service used by LinkedIn:

  • They chose a cross-platform design (with all platforms using the same API and same endpoints for the same features) and an all-encompassing design (one API service calls all product verticals), to allow for high code reuse.
  • They reused data-schema definitions and endpoints to make it easier for engineers to collaborate but it led to issues at scale, when extended to deployment architecture. It was addressed by microclustering rather than breaking the monolith into microservices: Endpoints of the services were partitioned without breaking the code, routing traffic for each partition to a dedicated cluster of servers. Data from monitoring systems were used to identify which verticals had enough traffic to justify a partition.
  • For each vertical, the build system was modified to create an additional deployable named after the vertical, with configuration inherited from the shared service and extended. Traffic from the vertical’s endpoints was examined to estimate the number of servers needed in the new cluster.
  • While deploying, capacity testing was carried out - when there was enough traffic to overload at least three servers, servers were slowly taken down to observe latencies and error rates, revealing how many queries-per-second each server could process without incident. This information was used for capacity planning, to fine-tune resource allocation.
  • The results of microclustering include the ability to limit downstream failures and bugs to a single vertical, and each cluster can be tuned independently of the others for better capacity planning, monitoring and granular control over deployment.

Full post here, 5 mins read

The differences between gateway, microgateway and service mesh

An API gateway is a central interface for all external communications. It typically works by invoking multiple microservices and aggregating results to determine the best path.
Read more

The differences between gateway, microgateway and service mesh

  • An API gateway is a central interface for all external communications. It typically works by invoking multiple microservices and aggregating results to determine the best path.
  • It may also handle authentication, input validation and filtering, and metric collection, as well as transforming requests and/or results. For the microservices network, it offers lower latency, better efficiency and higher security, as well as easier isolation of single sources of failure.
  • API microgateways are proxies sitting close to microservices for internal communication between microservices, allowing better governance, discovery, observability, and stability for developers and expose the policy enforcement point and security controls to operators.
  • They are a more granular solution than a single API gateway due to the control of exposure. They offer low latency and a small footprint, as requests don’t need to wait their turn. This does imply code duplication across multiple microservice instances, which can be inefficient if code is not intelligently structured.
  • A service mesh is a layer between microservices for all service-to-service communication that replaces direct communication between services. It will often have in-built support for resiliency, error checking, and service discovery. It is similar to a microgateway but with network communications entirely abstracted from business logic, allowing developers to focus entirely on the latter.

Full post here, 8 mins read

To create an evolvable API, stop thinking about URLs

To build an evolvable API, instead of forcing clients to have prior knowledge of URLs, fields and HTTP methods, you should let the client ask the server what is required to complete an operation and indicate the preferred host and path.
Read more

To create an evolvable API, stop thinking about URLs

  • To build an evolvable API, instead of forcing clients to have prior knowledge of URLs, fields and HTTP methods, you should let the client ask the server what is required to complete an operation and indicate the preferred host and path.
  • Critical aspects of evolvable API include:

a) The state of the conversation being stored in the network - not sourced from either client or server.

b) No versioning needed - when you add or remove data from a response, clients should know how to react. If they don’t know how to react to a new feature, they should be able to ignore it and work in the old way.

c) The server owns actions which contain values for URLs, methods, and fields, so that they control where clients go to continue the conversation, with only the entry point hardcoded in the client.

  • With control of URLs in the server, it can run A/B testing and direct clients to different servers running the same instance of the application. The server can also implement a polling functionality to track the status of requests.
  • Model communication on how people actually operate. Think not only about a generic language but developing a shared domain vocabulary.

Full post here, 10 mins read

An overview of caching methods

The most common caching methods are browser caching, application caching and key-value caching. Browser caching is a collaboration between the browser and the web server and you don’t have to write any extra code.
Read more

An overview of caching methods

  • The most common caching methods are browser caching, application caching and key-value caching.
  • Browser caching is a collaboration between the browser and the web server and you don’t have to write any extra code. For example - in Chrome when you reload a page you have visited before, the date specified under ‘expires’ in the 'Responses' header determines whether the browser loads resources directly from cache (from your first visit) or requests the resources again from the server. The server uses the headers passed by the browser (headers like If-modified-since or If-none-match or Etag) to determine whether to send the resources afresh or ask the browser to load from its cache.
  • Application-level caching is also called memoization and it is useful if your program is very slow. Think of cases where you are reading a file and extracting data from it or requesting data from an API. The results of any slow method are placed in an instance variable and returned on subsequent calls to the method. This speeds up the method. Though the downsides to this kind of caching are that you lose the cache on restarting the application and you cannot share the cache between multiple servers.
  • Key-value data caching takes memoization a step further with dedicated databases like memcache or Redis. This allows cached data to persist through user requests (allowing data sharing) and application reboots, but does introduce a dependency to your application and adds another object to monitor.
  • To determine the best method for you, start with browser caching as the baseline. Then identify your hotspots with an application profiling tool before choosing which method to grow with to add a second layer of caching.

Full post here, 7 mins read

Designing resilient systems beyond retries: rate limiting

You can limit requests by client or user account or by endpoints. These can be combined to use different levels of thresholds together, in a specific order, culminating in a server-wide threshold possibly.
Read more

Designing resilient systems beyond retries: rate limiting

  • In distributed systems, a circuit-breaker pattern and retries are commonly used to improve resiliency. A retry ‘storm’ is a common risk if the server cannot handle the increased number of requests, and a circuit-breaker can prevent it.
  • In a large organization with hundreds of microservices, coordinating and maintaining all the circuit-breakers is difficult and rate-limiting or throttling can be a second line of defense.
  • You can limit requests by client or user account (say, 1000 requests per hour each and then reject requests till the time window resets) or by endpoints (benchmarked to server capabilities so that the limit applies across all clients). These can be combined to use different levels of thresholds together, in a specific order, culminating in a server-wide threshold possibly.
  • Consider global versus local rate-limiting. The former is especially useful in microservices architecture because bottlenecks may not be tied to individual servers but to exhausted downstream resources such as a database, third-party service, or another microservice.
  • Take care to ensure the rate-limiting service does not become a single point of failure, nor should it add significant latency. The system must function even if the rate-limiter experiences problems, and perhaps fall back to its local limit strategy.

Full post here, 11 mins read

Distributed systems learnings

Building a new distributed system is easier than migrating the old system over to it. Migrating an old system is more time-consuming and just as challenging as writing one from scratch.
Read more

Distributed systems learnings

  • Building a new distributed system is easier than migrating the old system over to it. Migrating an old system is more time-consuming and just as challenging as writing one from scratch. You tend to underestimate the amount of custom monitoring needed to ensure they both work the same way and a new system is more elegant, but you need to decide whether to accommodate or drop edge cases from the legacy system.
  • To improve reliability, start simple, measure, report and repeat: establish simple service-level objectives (SLOs) and a low bar for reliability (say 99.9%), measure it weekly, fix systemic issues at the root of the failure to hit it, and once confident, move to stricter definitions and targets.
  • Treat idempotency, consistency and durability changes as breaking changes, even if technically not, in terms of communication, rollouts, and API versioning.
  • Give importance to financial and end-user impacts of outages over the systems. Talk to the relevant teams and use appropriate metrics, and use these to put a price tag on preventive measures.
  • To determine who owns a service, check who owns the oncall(the operating of the system). The rest - code ownership, understanding of the system - follow from there. This means that shared oncall between multiple teams is not a healthy practice but a bandage solution.

Full post here, 6 mins read

An introduction to load testing

Common parameters to test should include server resources (CPU, memory, etc) for handling anticipated loads; quickness of response for the user; efficiency of application; the need for scaling up hardware or scaling out to multiple servers; and maximum requests per second.
Read more

An introduction to load testing

  • Load testing is done by running the software on one machine (or cluster of machines) to generate a large number of requests to the webserver on a second machine (or cluster).
  • Common parameters to test should include server resources (CPU, memory, etc) for handling anticipated loads; quickness of response for the user; efficiency of application; the need for scaling up hardware or scaling out to multiple servers; particularly resource-intensive pages or API calls; and maximum requests per second.
  • In general, a higher number of requests implies higher latency. But it is a good practice in real life to test multiple times at different request rates. Though a website can load in 2-5 seconds, web server latency should typically be around 50-200 milliseconds. Remember that even ‘imperceptible’ improvements add up in the aggregate for a better UX.
  • As a first step, monitor resources - mostly CPU load and free memory.
  • Next, find the maximum response rate of your web server by setting desired concurrency (100 is a safe default but check settings like MaxClients, MaxThreads, etc for your server) and test duration in any load testing tool. If your software only handles one URL at a time, run the test with a few different URLs with varying resource requirements. This should push the CPU idle time to 0% and raise response times beyond real-world expectations.
  • Dial back the load and test again for how your server performs when not pushed to its absolute limit: specify exact requests per second, and cut your maximum requests from the earlier step in half. Step up or down requests by another halfway each time till you reach your maximum for acceptable latency (which should be in the 99th or even 99.999th percentile).
  • Some options among load-testing software you can explore - ab (ApacheBench), JMeter, Siege, Locust, and wrk2.

Full post here, 13 mins read

How to continuously profile tens of thousands of production servers

Some lessons & solutions from the Salesforce team that can be useful for other engineers too.
Read more

How to continuously profile tens of thousands of production servers

Some lessons & solutions from the Salesforce team that can be useful for other engineers too.

  • Ensure scalability: If writes or data are too voluminous for a single network or storage solution to handle, distribute the load across multiple data centers, coordinating retrieval from a centralized hub for investigating engineers, who can specify which clusters of hosts they may want data from.
  • Design for fault-tolerance: In a crisis where memory and CPU are overwhelmed or network connectivity lost, profiling data can be lost too. Build resilience in your buffering and pass the data to permanent storage, while allowing data to persist in batches.
  • Provide language-agnostic runtime support: If users might be working in different languages, capture and represent profiling and observability data in a way that works regardless of the underlying language. Attach the language as metadata to profiling data points so that users can query by language and ensure data structures for stack traces and metadata are generic enough to support multiple languages and environments.
  • Allow debugging engineers to access domain-specific contexts to drive their investigations to a speedier resolution. You can do a deep search of traces to match a regular expression, which is particularly useful to developers debugging the issue at hand.

Full post here, 9 mins read

Benefits of dependency injection

It improves code maintainability. Stand-alone classes are easier to fix than complicated and tightly coupled classes.
Read more

Benefits of dependency injection

  • It improves code maintainability. Stand-alone classes are easier to fix than complicated and tightly coupled classes.
  • It improves code quality by improving code testability. If it is easy to test code, it will get tested more often which will lead to higher quality codebase.
  • Dependency injection makes code more readable as the classes used are small, to the point, compact and more clearly defined.
  • When you use dependency injection, you get loosely coupled code that is more flexible. Small classes that do one thing can more easily be reassembled and reused, which in turn saves time and money.
  • It leads to a more extendable class structure.
  • It facilitates team development as after defining abstractions, teams working together can write their code using the abstractions, even before implementations are written.

Full post here, 4 mins read

How to improve a legacy codebase

Begin with a backup - copy everything to a safe place in read-only mode. And check if the codebase has a build process and that it actually produces what runs in production.
Read more

How to improve a legacy codebase

  • Begin with a backup - copy everything to a safe place in read-only mode. And check if the codebase has a build process and that it actually produces what runs in production.
  • Freeze the DB schema until you are done with the first level of improvements. It allows you to compare the effect your new business logic code has compared to the old business logic code.
  • When adding instrumentation to the old platform in a completely new database table, add a simple counter for every event and a single function to increment these counters based on the name of the event.
  • Never try improving both the maintainability of the code or the platform it runs on at the same time as adding new features or fixing bugs. It will invalidate some of the tests you made earlier.
  • When migrating to a new platform, all business logic and interdependencies should remain exactly the same as before.
  • Test the new DB with the new code and all the tests in place to ensure your migration goes off without a hitch.

Full post here, 11 mins read

Words are hard - an essay on communicating with non-programmers

When communicating with people outside of the programming world, stay away from using technical jargon. It can make people feel excluded and it can confuse them.
Read more

Words are hard - an essay on communicating with non-programmers

  • Remember communication is complete when you are understood and not when you have said what you had to say. When communicating with people outside of the programming world, stay away from using technical jargon. It can make people feel excluded and it can confuse them.
  • It’s often necessary to simplify things. Be respectful, and never condescending, while doing that. Try explaining things in a step by step fashion, mention what edge cases are you considering with an example or two, and present a generalised gist.
  • Before an important meeting with non-technical stakeholders, work with a non-technical person as a sounding board to help you see if you’re making sense to your audience.
  • Try to personify or use analogies when explaining yourself. Use visual analogies - draw diagrams, tables, charts, flowcharts and the like to help illustrate your points.
  • When explaining performance metrics, find a middle ground between accuracy and understandability.

Full post here, 14 mins read

The Joel Spolsky Test: 12 Steps to Better Code

Do you use source control? Can you make a build in one step? Do you fix bugs before writing new code?
Read more

The Joel Spolsky Test: 12 Steps to Better Code

You get 1 point for each “yes”. Score of 12 is perfect, 11 is tolerable, but 10 or lower and you’ve got serious problems:

  1. Do you use source control? Source control makes it easier for programmers to work together.
  2. Can you make a build in one step? If the process takes any more than one step, it is prone to errors.
  3. Do you make daily builds? It makes sure no breakage goes unnoticed.
  4. Do you have a bug database? If you haven’t been listing all known bugs in the code, you will ship low-quality code.
  5. Do you fix bugs before writing new code? The longer you wait before fixing a bug, the costlier - both in time and in money - it is to fix.
  6. Do you have an up-to-date schedule? The only way to factor in planning decisions is to have a schedule and to keep it up to date.
  7. Do you have a spec? Once the code is written, the cost of fixing problems is dramatically higher.
  8. Do programmers have quiet working conditions? Knowledge workers work best by getting into ‘flow, or ‘‘in the zone’.
  9. Do you use the best tools money can buy? Getting the best machines will stop programmers from getting bored while the compiler runs.
  10. Do you have testers? Skimping on testers is a false economy.
  11. Do new candidates write code during their interview? Would you hire a caterer for your wedding without tasting their food?
  12. Do you do hallway usability testing? Grabbing the next person that passes by in the hallway and ask them to try to use the code you just wrote.

Full post here, 16 mins read

How to refactor a monolithic codebase over time

If source code is not stored under version control, get it under version control. Check for static code analysis. If there is no test suite, build that before you refactor. If
Read more

How to refactor a monolithic codebase over time

  • Understand the application thoroughly. Talk to previous developers, business owners, PMs, etc. Read through code comments, code commit messages, README files, documentation, wikis. Checking the bug-reporting tool used as well. Put all the learning together.
  • If source code is not stored under version control, get it under version control. Check for static code analysis.
  • If there is no test suite, build that before you refactor. If there is a test suite, understand the level of code coverage. Check how long it takes to complete the suite, whether it exhausts available memory, how many tests fail, are skipped or are outdated, whether there is a good mix of unit, integration and functional tests, whether there are sections of codebase escaping coverage. Also, look for comments indicating something that was to be fixed later.
  • Begin refactoring, keeping in mind that your code will never be perfect. So know when to move on.
  • To clean up a complex codebase, sometimes you need to make seemingly trivial changes. But most of the changes should have a clear, effective purpose, such as creating a new feature or fixing a bug or defect.
  • Work to a project plan, with a series of achievable tasks, a realistic timeline, and resource requirement, sufficient staff to work on the tasks in isolation or in parallel with other projects.

Full post here, 8 mins read

The best ways to test your serverless applications

For the serverless functions you write, test for each of the following risks: configuration (databases, tables, access rights), technical workflow (parsing and using incoming requests, handling of successful responses and errors), business logic and integration.
Read more

The best ways to test your serverless applications

  • For the serverless functions you write, test for each of the following risks: configuration (databases, tables, access rights), technical workflow (parsing and using incoming requests, handling of successful responses and errors), business logic and integration (reading incoming request structures, storage order in databases).
  • Break up functions into hexagonal architecture (ports and adapters) with separation of concerns through layers of responsibility.
  • For unit tests, use a local adapter or mock as an adapter to test the function business layer in isolation.
  • Use adapters to simulate to test integration with third-party end services. Save memory and time by testing not for full integration but file storage integration with the in-memory adapter.
  • For proper monitoring of integrations, use back-end tools such as IOpipe, Thundra, Dashbird, Epsagon, etc., and front-end tools such as Sentry or Rollbar. You can also use an open-source error tracking app such as Desole that you install in your AWS account.

Full post here, 10 mins read

How to measure the reliability of your software throughout the CI/CD workflow

Look beyond log files and testing to determine quality of code: set up advanced quality gates to block problematic code from passing to the next stage and use feedback loops to inform more comprehensive testing.
Read more

How to measure the reliability of your software throughout the CI/CD workflow

  • In addition to CI/CD, you should consider incorporating continuous reliability in the workflow. This may mean more focus on troubleshooting than on writing code.
  • Consider whether to automate every step, or even whether some steps should be automated more than others.
  • Look beyond log files and testing to determine quality of code: set up advanced quality gates to block problematic code from passing to the next stage and use feedback loops to inform more comprehensive testing.
  • In addition to log aggregators and performance monitoring tools, get a more granular understanding of app quality by ensuring you can access the source code, variable state and stack trace at the time of an error. Aggregate this data across the app, library, class, deployment or another boundary for an insight into the functional quality of the code.
  • Based on this data, you can categorise known, reintroduced and unknown errors, classify events, understand frequency and failure rates, enabling you to write more comprehensive tests for development and pre-production environments alike, driving the higher code quality.

Full post here, 6 mins read

Our adventures in scaling

Handling sudden activity spikes poses different challenges than scaling a rapidly growing user base. Check whether databases are resource-constrained and hence slowing down. Check hardware metrics during spikes to check on CPU, disk i/o and memory.
Read more

Our adventures in scaling

  • Handling sudden activity spikes poses different challenges than scaling a rapidly growing user base.
  • Check whether databases are resource-constrained and hence slowing down. Check hardware metrics during spikes to check on CPU, disk i/o and memory.
  • If there are no spikes in those metrics, look higher up the infrastructure stack at service resources for increased resource acquisition times. Also, check the garbage collection activity, which indicates whether JVM heap and threads are the bottlenecks.
  • Check network metrics next to look for a constraint in the network between services and databases - for example, if the services’ database connection pools are consistently reaching size limits.
  • To collect more metrics, log the latency of all transactions and collect those higher than a defined time, which should be analysed across daily usage to determine whether removing the identified bottleneck would make a significant difference.
  • Some of the bottlenecks may be code-related, for example, inefficient queries, a service is resource-starved, inconsistencies in database response itself - so look for metrics on higher-level functioning and not just low-level system components.

Full post here, 6 mins read

Migrating functionality between large-scale production systems seamlessly

Incorporate shadowing to forward production traffic to the new system for observation, making sure there would be no regressions. This lets you gather performance stats as well.
Read more

Migrating functionality between large-scale production systems seamlessly

Lessons from Uber’s migration of its large and complex systems to a new production environment:

  • Incorporate shadowing to forward production traffic to the new system for observation, making sure there would be no regressions. This lets you gather performance stats as well.
  • Use this opportunity to settle any technical debt incurred over the years, so the team can move faster in the future and your productivity rises.
  • Carry out validation on a trial and error basis. Don’t assume it will be a one-time effort and plan for multiple iterations before you get it right.
  • Have a data analyst in your team to find issues early, especially if your system involves payments.
  • Once confident in your validation metrics, you can roll out to production. Uber chose to start with a test plan with a couple of employees dedicated to testing various success and failure cases, followed by a rollout to all Uber employees, and finally incremental rollout to cohorts of external users.
  • Push for a quick final migration, as options for a rollback are often misused, preventing complete migration.

Full post here, 6 mins read

How to optimize your website speed by improving the backend

Normalize relational databases at the design stage itself and ensure effective indexing so the indexes don’t slow down your website. In some cases, denormalization is more effective though - where there are many table joins, adding an extra field to one table may be better
Read more

How to optimize your website speed by improving the backend

  • The N+1 query problem slows down many apps when several queries are issued to linked fields in a database. You can use the ActiveRecord ORM tool in Rails that employs eager loading of all associated elements with a single query to help solve this problem.
  • Normalize relational databases at the design stage itself and ensure effective indexing so the indexes don’t slow down your website. In some cases, denormalization is more effective though - where there are many table joins, adding an extra field to one table may be better or adding calculated values you need often to a table can help if you frequently execute complicated calculations.
  • Cache carefully to speed up your site. For SQL caching in Rails, use low-level caching to store query results for a longer time. In general, prefer fragment caching of page blocks for dynamic web apps, use page caching in Rails with the actionpack-page_caching gem, but avoid it if your web has frequently updated content like news feeds. For authentication actions and error messages, use the actionpack-action_caching gem.
  • Use a content delivery network (CDN) of edge servers to cache static content like images, JavaScript, and CSS files for reduced latency across geographies, reduced operational costs compared to handling your own servers, stability and scalability.

Full post here, 11 mins read

Simple Java performance tuning tips

Use primitive types rather than wrapper classes wherever possible to minimize overheads as they are stored to the stack and not the heap.
Read more

Simple Java performance tuning tips

  • To start optimizing your app, use a profiler to find the real bottlenecks in the code and then create a performance test suite for the whole application based on that information. Run your tests before and after every attempt at optimization.
  • Use primitive types rather than wrapper classes wherever possible to minimize overheads as they are stored to the stack and not the heap. Avoid BigInteger and BigDecimal as they dramatically slow down calculations and use a lot of memory.
  • If your app uses a lot of replace operations and you aren’t updated to the latest version of Java, consider the Apache Commons StringUtils.replace method rather than String.replace. You can make the change easily by adding a Maven dependency for Apache’s Commons Lang to your app’s pom.xml to replace all instances.
  • Cache especially your more expensive resources or most-used snippets of code, such as database connections or the valueOf method for the Integer class. However, you are creating an overhead and you may need to manage the cache to keep it accessible and remove outdated information, so be sure the tradeoff is worthwhile.

Full post here, 9 mins read

Tips for 10x application performance

Cache both static and dynamic content to reduce the load on application servers. Use established compression standards to reduce file sizes for photos, videos, and music. Avoid leaving text data, including HTML, CSS, and JavaScript uncompressed.
Read more

Tips for 10x application performance

  • Accelerate and secure applications with a reverse proxy server to free up the application server from waiting for users to interact with it. It is also a prerequisite for many other performance increasing capabilities - load balancing, caching static files, and for better security & scalability too.
  • Apply load balancing to protocols such as HTTP, HTTPS, SPDY, HTTP/2, WebSocket, FastCGI, SCGI, uwsgi, memcached, TCP-based applications, Layer 4 protocols etc.
  • Cache both static and dynamic content to reduce the load on application servers.
  • Use established compression standards to reduce file sizes for photos, videos, and music. Avoid leaving text data, including HTML, CSS, and JavaScript uncompressed as their compression can have a large effect especially over slow or otherwise constrained connections. If you use SSL, compression reduces the amount of data to be SSL-encoded, saving time.
  • Monitor real-world performance closely, in real-time, both within specific devices and across your web infrastructure. You should use global application performance monitoring tools to check page load times remotely and also monitor the delivery side.

Full post here, 20 mins read

What is clean code?

Put in the extra effort today to refactor and test your code, in order to save yourself (and others) pains tomorrow. Poorly crafted code unravels fast, while high-quality code saves money and builds customer loyalty while reducing technical debt.
Read more

What is clean code?

Few key ideas from Robert C. Martin’s Clean Code:

  • Even though perfect code is an illusion, understand that craftsmanship matters because high-quality (thoughtful, flexible, maintainable) code results in long-term payoffs for the business. Code that ‘just works’ may only offer short-term success.
  • Put in the extra effort today to refactor and test your code, in order to save yourself (and others) pains tomorrow. Poorly crafted code unravels fast, while high-quality code saves money and builds customer loyalty while reducing technical debt.
  • Take pride in your work but recognize that your code is not your own and must be used and understood easily by others. So, avoid clever hacks and sleights of hand no matter how much fun they seem while coding. Focus on code that’s simple in implementation, readable (with consistent naming conventions, spacing, structure, and flow), considerate of future consumers and professionals, tested, relentlessly factored and SOLID (following the principles for longevity, maintainability, and flexibility).
  • Remember that the best way to write clean code is to do it all the time so that you retain and refresh the skill continually - even on your personal projects.

Full post here, 6 mins read

How to write good code documentation

There is no such thing as self-documenting code but your code needs to be self-explanatory. Invest time to review your work and write clean, structured code. It is less likely to contain bugs and will save time in the long run.Ensure you practice continuous documentation within your development process
Read more

How to write good code documentation

  • There is no such thing as self-documenting code but your code needs to be self-explanatory. Invest time to review your work and write clean, structured code. It is less likely to contain bugs and will save time in the long run.
  • Ensure you practice continuous documentation within your development process so that it is appropriately prioritized and is written, reviewed and delivered on time with the code.
  • You might write low-level tests for specific scenarios before the code, and leave architecture, user and support documentation to the end of the release cycle when all information is known and frozen. But store it all in one place and keep it up to date.
  • Get feedback on your documentation too - both as a health check and for clarity and comprehensiveness - and incorporate it. One of the best ways to do this is Jason Freedman’s 30/90 feedback method. At 30% completion, ask a reviewer to look at the broad outline, flow, and structure of document. At 90% have them go over it with a fine-toothed comb. Have peer, user and expert sessions in between.

Full post here, 6 mins read

Embracing the chaos of chaos engineering

You start with a hypothesis and you make an educated guess of what will happen in various scenarios, including deciding on your steady-state. Then you introduce real-world events to test your guesswork.
Read more

Embracing the chaos of chaos engineering

  • You start with a hypothesis and you make an educated guess of what will happen in various scenarios, including deciding on your steady-state.
  • Then you introduce real-world events to test your guesswork for hardware/VM failure, state inconsistency, running out of processing power or memory or time, dependency issues, rare conditions, traffic spikes, and service unavailability.
  • After that comes doing test runs in production on the properly pretested codebase (though be cautious of doing this to safety-critical systems such as banking) and then complete your hypothesis based on how real-world events affect your steady-state.
  • You should communicate your results not only to engineers but also to support staff and community managers who face the public.
  • Use different tools to run your experiments and ensure you have alerts & reporting systems in place to minimize potential damage. Abort quickly if needed.
  • Once you have defined ideal metrics and potential effects, increase the scope of testing by changing the parameters or events you test each time, applying fixes as you go till you find the points where the system really starts to break down.

Full post here, 6 mins read

Sampling in observability

Subcomponents of a system may need different sampling strategies, and the decision can be quite subjective.
Read more

Sampling in observability

  • You can use sampling APIs by way of instrumentation libraries that let you set sampling strategies or rates. For ex, Go’s runtime.setCPUProfileRate, which lets you set CPU profiling rate.
  • Subcomponents of a system may need different sampling strategies, and the decision can be quite subjective: for a low-traffic background job, you might sample every task but for a handler with low latency tolerance, you may need to aggressively downsample if traffic is high, or you might sample only when certain conditions are met.
  • Consider making the sampling strategy dynamically configurable, as this can be useful for troubleshooting.
  • If collected data tracks a system end to end and the collection spans more than one process, like distributed traces or events, you might want to propagate the sampling decision from parent to child process through the header passed down.
  • If collecting data is inexpensive but transferring or storage are, you can collect 100% of the data and apply a filter later to minimize while preserving diversity in the sample, retaining edge cases specifically for debugging.
  • Never trust a sampling decision propagated from an external source; it could be a DOS attack.

Full post here, 4 mins read

The 3 myths of observability

A myth is that getting an observability tool is a good strategy - Having an observability platform is not sufficient on its own. Unless observability becomes core to your engineering efforts your company culture, no tool can help.
Read more

The 3 myths of observability

  • Myth #1 is that you will experience fewer incidents if you implement an observability strategy - Just implementing a strategy has no impact on the number of event occurrences but having it in place means that when a problem arises, you will have enough telemetry data to quickly solve it.
  • Myth #2 is that getting an observability tool is a good strategy - Having an observability platform is not sufficient on its own. Unless observability becomes core to your engineering efforts your company culture, no tool can help.
  • Myth #3 is that implementing observability is cheap. As observability is a core part of any modern tech infrastructure, you should think of your observability budget as a percentage of your overall infrastructure budget. The value derived from a good observability program in terms of efficiency, speed, and customer satisfaction surpasses the costs it incurs.

Full post here, 4 mins read

Improving incident retrospectives

Often, too much focus is on triggers for the incident. The retrospective should instead review the timeline of incidents, remediation items and find owners for the remediation items.
Read more

Improving incident retrospectives

  • Incidents retrospectives are an integral part of any good engineering culture.
  • Often, too much focus is on triggers for the incident. The retrospective should instead review the timeline of incidents, remediation items and find owners for the remediation items.
  • Retrospectives should be used as an opportunity for deeper analysis into systems (both people and technical) and assumptions that underlie these systems.
  • Finding remediation items should be decoupled from the retrospective process. It helps participants to be free in conducting a deeper investigation as they are unburdened from finding any shallow explanations quickly.
  • It’s a good practice to lighten up the retrospective template you are using because any template will be unequipped to capture unique characteristics of varied incidents. Also, sticking rigidly to a template means limits open-ended questions that can be quite useful in evolving your systems in the right direction.

Full post here, 6 mins read

Why API responses should be signed

As a recipient of any data, you want to know who originally published it and be sure it was not tampered with to establish authenticity. This can be achieved by adding signatures to validate messages.
Read more

Why API responses should be signed

  • As a recipient of any data, you want to know who originally published it and be sure it was not tampered with to establish authenticity. This can be achieved by adding signatures to validate messages.
  • One option is to keep the signature and the message separate, requested by different API calls, to reduce complexity for the server so that it only makes the second call if the user demands it. Storage can be complicated with this approach.
  • The second option is to include the signature with the message, which you must encode first, but that renders the response no longer human-readable and the response must be decoded for interpretation.
  • A third option is to sign only critical parts of the response rather than all the metadata. This is easiest to implement, simple to parse for both humans and computers, but sometimes the metadata itself may be important information to verify.
  • In all the above options, the API provider must securely manage cryptographic keys, which is expensive and complicated, and the API can be compromised if a hacker gets hold of the keys.
  • To solve the problem effectively, you could checkout JOSE. It is a suite of specifications, including JSON web tokens which are already used across the internet mostly to sign OAuth logins.

Full post here, 5 mins read

Improving Mongo performance by managing indexes

To define an efficient index, you can build on top of a previously defined index as well. When you are compound indexing in this way, determine which property of your query is the most unique and give it a higher cardinality.
Read more

Improving Mongo performance by managing indexes

  • You can query large collections efficiently by defining an index and ensuring it is built in the background.
  • To define an efficient index, you can build on top of a previously defined index as well. When you are compound indexing in this way, determine which property of your query is the most unique and give it a higher cardinality. This higher cardinality will help in limiting the search area of your query.
  • To ensure your database uses your index efficiently, ensure the index fits in the available RAM on your database server as part of Mongo’s working set. Check this using the db.stats().indexSize and determining your default allocation of RAM.
  • To keep index sizes small, examine the usage of indexes of a given collection and remove the unused ones, examine compound indexes to check whether some are redundant, make indexes sparser by imposing a $partialFilterExpression constraint to tell them which documents to use, and minimize fields in compound indexes.

Full post here, 9 mins read

The good and bad of serverless

The good It’s truly scalable & saves you from the pains of managing servers manually. Serverless applications are a notch above Virtual Private Servers - you only pay for what you need.
Read more

The good and bad of serverless

The good

  • It’s truly scalable & saves you from the pains of managing servers manually.
  • Serverless applications are a notch above Virtual Private Servers - you only pay for what you need.
  • Developers on your team don’t have to deal with the technicalities of setting up scaling policies or configuring load balancers, VPCs, server provisioning, etc.

The bad

  • Cold starts when a function has been idle. To solve it, ping your functions periodically to ensure they’re always warm or set up a single function to handle all API calls in order to ensure that cold-starts only happen once.
  • The need for applications to be truly stateless. You must design your application to be ready to serve a request from a cold, dead state.
  • Not ideal for long-running jobs. Re-examine whether the time limit hinders your ability to process all the data or try using Lambda recursively.

Full post here, 9 mins read

API profiling at Pinterest

Common profiling measurements are CPU, memory, and frequency of function calls. There are two approaches to taking these measurements - event-based profiling & statistical profiling.
Read more

API profiling at Pinterest

  • Common profiling measurements are CPU, memory, and frequency of function calls. There are two approaches to taking these measurements - event-based profiling & statistical profiling.
  • In event-based profiling, you track all occurrences of certain events such as function calls, returns and thrown exceptions. Statistical profiling is about sampling data by probing the call stack periodically. And it is less accurate but faster, with lower overheads.
  • Pinterest’s API gateway service is written in Python. So, for memory profiling, tracemalloc package was used to track memory blocks.
  • To calculate operational costs, Pinterest needed to combine resource utilization data and request metrics that show the popularity of each endpoint. This helped them identify the most costly endpoints and they also identified the engineers/teams they belonged to. This encouraged ownership and proactive performance monitoring by the respective teams.
  • Dead code - unused, unowned code such as old experiments, tests and files, even lines of code in a file never actually called on in practice - can clutter repositories. Pinterest used a standard Python test coverage tool to identify dead code and then got rid of it.

Full post here, 7 mins read

What I talk about when I talk about logging

Separate production and logging (collecting, handling and archiving) so that log analysis does not create an additional load on production systems and also, logs are safeguarded from attackers trying to hide their trail.
Read more

What I talk about when I talk about logging

  • Analyzing logs is as, or more, important than logging. Only log what you intend to analyze.
  • Separate production and logging (collecting, handling and archiving) so that log analysis does not create an additional load on production systems and also, logs are safeguarded from attackers trying to hide their trail.
  • Transport logs to a centralized log server with appropriate access rights and archiving policies. Also, preserve the logs as raw as possible for later analysis and do not aggregate them in earlier phases.
  • Before log analysis, ensure you have created a clear understanding of your system’s baseline behavior. You will then know what to log, how long to retain the logs, and can add flexible tools to help you analyze the logs quickly and effectively in any format.
  • Enable automated reporting of event occurrences after setting baselines and thresholds. This way, you will be sure to look at logs whenever something important transpires.

Full post here, 6 mins read

Modernizing your build pipelines

Keep your pipeline highly visual and avoid over-abstraction. Visualization makes builds easy to understand and allows failed builds to be traced back quickly.
Read more

Modernizing your build pipelines

  • A high-quality pipeline must be fast. This needs quick feedback. To achieve this, let your CI tool parallelize all tasks that don’t have mutual dependencies and avoid running multiple checks together.
  • Have pipelines reflect in the code and call shell scripts that also work locally for easier testing before pushing to deploy, enabling a faster feedback loop.
  • To ensure your pipeline is reliable and reproducible, use containers to run each task in isolation and build the containers within the pipeline, a fresh container at each step.
  • While a persistent workspace saves time, it can build in flakiness, for which a good tradeoff may be improving speed by caching dependencies instead of downloading them each time.
  • Keep your pipeline highly visual and avoid over-abstraction. Visualization makes builds easy to understand and allows failed builds to be traced back quickly.
  • Your system must be scalable across multiple pipelines. Avoid duplication (slows the pipelines down) and parametrize tasks instead, so that you configure them by passing variables, and build a library of tasks that lets you reuse code across pipelines, while also reducing coupling between tasks and pipelines.

Full post here, 10 mins read

Cold start/warm start with AWS Lambda

Programming language can impact the duration of a cold start in Lambda: Java and C# are typically slower to initialize than Go, Python or Node but they perform better on warm calls.
Read more

Cold start/warm start with AWS Lambda

  • Programming language can impact the duration of a cold start in Lambda: Java and C# are typically slower to initialize than Go, Python or Node but they perform better on warm calls.
  • Adding a framework to structure the code deployed in Lambda increases execution time with cold calls, which can be minimized by using a serverless-oriented framework as opposed to a web framework. Typically, frameworks don’t impact warm calls.
  • In serverless applications, one way to avoid cold starts is to keep Lambda warm beyond its fixed 5-minute life by preventing it from being unloaded. You can do this by setting up a cron to invoke Lambda at regular intervals. However, AWS Lambda will still reset every 4 hours and autoscaling must be taken into account.
  • To avoid cold starts in case of concurrent calls from automatic autoscaling, make pools of Lambda instances kept warm as above; but you will need to determine an optimal number to avoid wasting resources.

Full post here, 11 mins read

Tips for architecting fast data applications

Implement an efficient messaging backbone for reliable, secure data exchange with low latency. Apache Kafka is a good option for this.
Read more

Tips for architecting fast data applications

  • Understand requirements in detail: how large each message is, how many messages are expected per minute, whether there may be large changes in frequency, whether records can be batch-processed, whether time relationships and ordering need to be preserved, how ‘dirty’ the data may be and does the dirt need to be cleaned, reported or ignored, etc.
  • Implement an efficient messaging backbone for reliable, secure data exchange with low latency. Apache Kafka is a good option for this.
  • Leverage your SQL knowledge, applying the same relational algebra to data streams in time-varying relations.
  • Deploy cluster managers or cluster management solutions for greater scalability, agility, and resilience.

Full post here, 7 mins read

How to optimize the API response package

Offer filtering of results according to parameters specified by the requester. This reduces the calls made and results displayed as well as limits the resources fed to the user, resulting in tangible optimization and better user experience.
Read more

How to optimize the API response package

  • Paginate responses into batches of content that are easily browsable, because they are segmented into set numbers (10 per page, 20 per page, etc), limited (say only the first 1,000 entries are paginated), and standardized (using ‘next’, ‘last’ etc for cursor navigation).
  • Offer filtering of results according to parameters specified by the requester. This reduces the calls made and results displayed as well as limits the resources fed to the user, resulting in tangible optimization and better user experience. Do this while keeping in mind that overly complex filtering can work against optimization.
  • Use ranges to restrict results based on a user-specified structure, so that only specific elements within the range are considered applicable for the request to execute. This lets you offload data processing from the client-side to the server.
  • Avoid over-fetching and under-fetching, which can result from poorly formed requests or badly implemented scaling techniques.

Full post here, 12 mins read

Optimizing website performance and critical rendering path

Many things can lead to high rendering times for web pages - the amount of data transferred, the number of resources to download, length of the critical rendering path (CRP), etc.
Read more

Optimizing website performance and critical rendering path

  • Many things can lead to high rendering times for web pages - the amount of data transferred, the number of resources to download, length of the critical rendering path (CRP), etc.
  • To minimize data transferred, remove unused parts (unreachable JavaScript functions, styles with selectors not matching any element, HTML tags always hidden with CSS) and remove all duplicates.
  • Reduce the total count of critical resources to download by setting media attributes for all links referencing stylesheets and making some styles inlined. Also, mark all script tags as async (not parser blocking) or defer (evaluated at end of page load).
  • You can shorten the CRP with the approaches above, and also rearrange the code amongst files so that the styles and scripts of above-the-fold content load before you parse or render anything else.
  • Keep style tags and script tags close to each other in HTML (linewise) to help the browser preloader, and batch HTML updates to avoid multiple layout changes (such as those triggered by window resizing or device orientation).

Full post here, 8 mins read

Tips to speed up serverless web apps in AWS

Keep Lambda functions warm by invoking the Ping function using AWS CloudWatch or Lambda with Scheduled Events and using the Serverless WarmUP plugin.
Read more

Tips to speed up serverless web apps in AWS

  • Keep Lambda functions warm by invoking the Ping function using AWS CloudWatch or Lambda with Scheduled Events and using the Serverless WarmUP plugin.
  • Avoid cross-origin resource sharing (CORS) by accessing your API and frontend using the same origin point. Set origin protocol policy to HTTPS when connecting the API gateway to AWS CloudFront and configure both API Gateway and CloudFront to the same domain, and configure their routing accordingly.
  • Deploy API gateways as REGIONAL endpoints.
  • Optimize the frontend by compressing files such as JavaScript, CSS using GZIP, Upload to S3. Use the correct Content-Encoding: gzip headers, and enable Compress Objects Automatically in CloudFront.
  • Use the appropriate memory for Lambda functions. Increase CPU speed when using smaller memory for Lambda.

Full post here, 4 mins read

Patterns for resilient architecture: Embracing failure at scale

Build your application to be redundant, duplicating components to increase overall availability across multiple availability zones or even regions. To support this, ensure you have a stateless application and perhaps an elastic load balancer to distribute requests.
Read more

Patterns for resilient architecture: Embracing failure at scale

  • Build your application to be redundant, duplicating components to increase overall availability across multiple availability zones or even regions. To support this, ensure you have a stateless application and perhaps an elastic load balancer to distribute requests.
  • Enable auto-scaling not just for AWS services but application auto-scaling for any service built on AWS. Determine your auto-scaling technology by the speed you tolerate - preconfigure custom golden AMIs, avoid running or configuring at startup time, replace configuration scripts with Dockerfiles, or use container platforms like ECS or Lambda functions.
  • Use infrastructure as code for repeatability, knowledge sharing, and history preservation and have an immutable infrastructure with immutable components replaced for every deployment, with no updates on live systems and always starting with a new instance of every resource, with an immutable server pattern.
  • As a stateless service, treat all client requests independently of prior requests and sessions, storing no information in local memory. Share state with any resources within the auto-scaling group using in-memory object caching systems or distributed databases.

Full post here, 10 mins read

Tips for more helpful code reviews

While suggesting changes include code samples. Treat a code review as a public conversation, and remember even experienced developers can gain from ideas on implementation.
Read more

Tips for more helpful code reviews

  • Consider the impact of your words. If necessary, rewrite comments for empathy and clarity both.
  • Elaborate on your concerns to be more specific and to avoid ambiguity, even if it sounds more verbose.
  • While suggesting changes include code samples. Treat a code review as a public conversation, and remember even experienced developers can gain from ideas on implementation.
  • Include a link to relevant documentation when referencing a function, commit or pull request. For concepts, link a blogpost. Make it easy for the author to get to things you reference with a single click.
  • Offer to chat in person or on video call for clarifications, and volunteer to pair up and collaborate on any suggested changes.

Full post here, 5 mins read

Hard truths for new software developers

You don’t need to reinvent the wheel to make a difference. In the early years of your career, aim to learn about others’ thought processes and why things are being done the way they are.
Read more

Hard truths for new software developers

  • Accept that you don’t know everything, that tech knowledge is competitive and that you have to keep working and learning to stay relevant throughout your career.
  • Acknowledge that social and soft skills matter as much as tech knowledge. You will need to hold mature human conversations that solve human problems to grow through your career.
  • You don’t need to reinvent the wheel to make a difference. In the early years of your career, aim to learn about others’ thought processes and why things are being done the way they are.
  • You will not always get the help you need, so seek out a mentor you can trust, who is also invested in your success and with a 5-10 years’ experience gap.
  • Your goal in the workforce is not just your success but that of the product. So assume that in interviews you are being evaluated not for accuracy but for brainstorming human factors that can affect the product’s success and learn to test for them.

Full post here, 9 mins read

Truths about code optimization

Don’t assume you know the problem. Run your code with a profiler and see which bits are the slow ones before you start to write new code. Your obvious best guess could well be wrong.
Read more

Truths about code optimization

  • Make sure you start with working code in the first place and have good unit tests so that speeding it up does not break anything.
  • Don’t assume you know the problem. Run your code with a profiler and see which bits are the slow ones before you start to write new code. Your obvious best guess could well be wrong.
  • As you optimize your code, run the profiler after every change to check whether the change actually helped and whether there is a new bottleneck now.
  • It is obvious but if a change did not measurably help, take it out no matter how brilliant you think it was.
  • You can always keep on making things faster, lighter, cheaper but it will also gradually make the code harder to read, longer, complex. Know when to stop.

Full post here, 5 mins read

Go’s tooling is an undervalued technology

Go uses decentralized package management, or rather, module management. There is no central module manager or module registration, no external repository to trust and no need for an internal one.
Read more

Go’s tooling is an undervalued technology

  • As it has no external dependencies, you can build the latest version of Go using just the compiler in seconds and cross-compiling only needs setting a couple of environmental variables (GOOS and GOARCH).
  • Go uses decentralized package management, or rather, module management. There is no central module manager or module registration, no external repository to trust and no need for an internal one.
  • Since a module only needs to be hosted on a reachable network with valid HTTPS certification, with the network path becoming the name, there is no worry over duplicating popular module names.
  • Dependencies are cryptographically locked to versions. So, an upstream source cannot change a published module for those depending on it.
  • As each dependency is a single point of failure, Go checks with a module proxy before fetching the dependency. If you prefer, there is a GOINSECURE option for experimentation to avoid HTTPS certification.

Full post here, 6 mins read

Design principles for your HTTP APIs

Impose consistency so that similar endpoints behave in similar ways, even in edge scenarios, with consistent vocabulary, URL structure, request/response formats and error handling.
Read more

Design principles for your HTTP APIs

  • Impose consistency so that similar endpoints behave in similar ways, even in edge scenarios, with consistent vocabulary, URL structure, request/response formats and error handling. This will help ensure that users don’t need to read extensive documentation and handling updates becomes easier for developers.
  • Achieve performance by avoiding early optimization, waiting until you have the right metric in place to optimize based on hard data - and collect that data from day one, with an APM tool.
  • Use metrics to inform evolution, so that you update and add features based on actual user usage of endpoints to avoid or minimize disruptions to existing implementations.
  • For complex APIs, avoid a 1:1 mapping between database tables and API resources. Build in usability by simplifying business transactions to require a single API call rather than multiple. If it isn’t possible, be as flexible as you can.
  • Adopt simplicity by building on top of universally accepted standards and tools. This will mean less overheads and less room for mistakes.

Full post here, 7 mins read

Making Python programs blazingly fast

Find what parts of your code are slowing down the program. A simple & lazy solution is to use a Unix time command. You can also use cProfile for detailed profiling.
Read more

Making Python programs blazingly fast

  • Find what parts of your code are slowing down the program. A simple & lazy solution is to use a Unix time command. You can also use cProfile for detailed profiling.
  • Once bottlenecks are identified, time the slow function without measuring the rest of the code.
  • The most obvious way of making it faster is to use built-in data types.
  • Caching or memoization with lru_cache will help improve functions that perform expensive I/O operations or some fairly slow recursive functions.
  • You can improve performance, by using seemingly unnecessary assignments like local variables.
  • You can also speed up your code just by wrapping the whole code in the main function and calling it once.
  • Avoid or limit using dot operators (.) as they trigger dictionary lookup using getattribute, which creates extra overhead in your code.
  • Operations on strings like modulus (%s) or .format() can get quite slow when running in a loop. Go for f-string instead which is the most readable, concise and fastest method.

Full post here, 5 mins read

Ruby on Rails: Ensuring security is covered in your application

Use strong parameters to accept data being sent to you from a request, supplying whitelisted values to throw an error if incorrect data comes in.
Read more

Ruby on Rails: Ensuring security is covered in your application

  • Set up authentication to verify user access. You can use devise, which uses Bcrypt, to make it difficult for hackers to compute a password. It can also help recover passwords, register and track sign-ins, lock records, etc.
  • Use strong parameters to accept data being sent to you from a request, supplying whitelisted values to throw an error if incorrect data comes in.
  • Add slugs to URLs to identify records in an easy-to-read form without releasing the id of the record.
  • Protect sensitive data, especially logins and payment pages, by enforcing https through the config file and averting cross-site scripting (XSS) attacks.
  • Check for active record exceptions and create an exception concern to sit above the application controller to guard against specific exceptions.

Full post here, 3 mins read

A pragmatic take on REST anti-patterns

Sometimes, you will need to make concessions based on organizational context rather than best practice, and sometimes there are excellent business and security reasons to heed internal resistance over the purity of design, which does not functionally achieve anything better anyway.
Read more

A pragmatic take on REST anti-patterns

  • Implementing HATEOAS (hypermedia as the engine of application state) using the available tooling is not easy for a majority of developers. And, for consumers, this may dilute the simplicity of the API. You might want to choose a pragmatic REST approach - adopting most of REST but not all or you might create a remote procedure call (RPC) API. Just call it like it is, rather than getting stuck on the REST claim.
  • Sometimes, you will need to make concessions based on organizational context rather than best practice, and sometimes there are excellent business and security reasons to heed internal resistance over the purity of design, which does not functionally achieve anything better anyway.
  • Avoid making a REST “entity” of your customers. It adds complexity and yet no extra value to the consumer, mixes up the data and the security models by requiring the user to identify themselves by URI. This, in turn, results in a brittle interface, and in a ‘fallacy of nested interfaces’ that even a developer finds hard to parse. Instead, model only what you need to model in your API, not other associated things.

Full post here, 8 mins read

Software engineering is different from programming

As a software engineer, you must understand the problem fully, the limitations of the solutions offered as well as the privacy and security implications.
Read more

Software engineering is different from programming

“All software engineers can program, but not all programmers can engineer software.”
  • As a software engineer, you must understand the problem fully, the limitations of the solutions offered as well as the privacy and security implications.
  • Sometimes the solution is not writing code but combining existing programs, educating users, or preempting future problems.
  • Your code must be readable, easily extended, work well with other programs and maintainable, with clear error messages and error logging, backed by solid documentation for easy debugging.
  • Well-engineered programs will work in many different environments, differently-resourced devices and across time zones, even on limited memory and processing power, backed by a comprehensive test suite.
  • Find and use good tools to shorten feedback loops, for code static analysis, type safety, to deploy, debug and measure performance

Full post here, 11 mins read

Tips for building and managing containers

Curate a set of Docker base images for your container because these base images can be reused as many apps share dependencies, libraries, and configurations. Docker Hub and Google Container Registry have thousands of pre-configured base images for download.
Read more

Tips for building and managing containers

  • Curate a set of Docker base images for your container because these base images can be reused as many apps share dependencies, libraries, and configurations. Docker Hub and Google Container Registry have thousands of pre-configured base images for download.
  • However, don’t trust arbitrary base images. always use a vulnerability scanner - incorporate static analysis into your pipeline and run it for all your containers. If you do find a vulnerability, rebuild your base image and don’t just patch it, then redeploy the image as immutable.
  • Optimize your base image, starting with the leanest most viable one and build your packages on that to reduce overheads, to build faster, use less storage, pull images faster, and minimize the potential surface of attack.
  • Use only one (parent) process per container. As a rule, each container should have the same lifecycle as the app itself.
  • Avoid embedding secrets inside containers, even if you keep the images private. For security, use Kubernetes Secrets objects to store sensitive data outside containers, use the Secrets abstraction to expose them as mounted volumes inside containers or as environmental variables.

Full post here, 7 mins read

Why Kubernetes is the new application server

Figuring out how to connect to a service is easy and available out of the box with Kubernetes. You get configuration information from the runtime environment without it having to be hardcoded in the application.
Read more

Why Kubernetes is the new application server

  • Figuring out how to connect to a service is easy and available out of the box with Kubernetes. You get configuration information from the runtime environment without it having to be hardcoded in the application.
  • Kubernetes ensures reliability and availability for your applications by providing elasticity through ReplicaSets, which control the number of app replicas that should run at any time.
  • As Kubernetes run many replicas of the containerized application and auto-scales too, logging and monitoring become even more important than usual scenarios. And for this purpose, it has observability built-in. An important thing to note is that you must store your logs outside the container to ensure they are persistent across different runs.
  • Kubernetes is resilient. It ensures that your specified number of pod replicas are consistently deployed across the cluster. This automatically handles any possible node failures.

Full post here, 12 mins read

Designing a microservices architecture for failure

rchitectural patterns and techniques like caching, bulkheads, circuit breakers, and rate-limiters can help build reliable microservices.
Read more

Designing a microservices architecture for failure

  • 70% of the outages are caused by changes in code. Reverting code is not a bad thing. Implementing change management strategies and automatic rollouts become crucial.
  • Architectural patterns and techniques like caching, bulkheads, circuit breakers, and rate-limiters can help build reliable microservices.
  • Self-healing can help recover an application. You should add extra logic to your application to handle edge cases.
  • Failover caching can help during glitches and provide the necessary data to your application.
  • You should use a unique idempotency key for each of your transactions to help with retries.
  • You can protect resources and help them to recover with circuit breakers. Circuit breakers usually close after a certain amount of time, giving enough space for underlying services to recover.
  • Use chaos engineering methods to test.

Full post here, 11 mins read

How to safely throttle high traffic APIs

Adopting a scalable language and framework can help spread the traffic across multiple endpoints and systems, spreading the load across a wider structure.
Read more

How to safely throttle high traffic APIs

  • Adopting a scalable language and framework can help spread the traffic across multiple endpoints and systems, spreading the load across a wider structure.
  • Rate limiting is among the most common ways to managing API traffic and may be implemented by Dynamic Limits (staying within a range of requests), Server Rate Limits (limits on servers) or Regional Data Limits (restrictions based on heuristics and baseline analysis).
  • Build a backend for your frontend (BFF), i.e. break each element of your API into additional functions that are then accessed via facade APIs, turning your traffic from single to multiple sources.
  • API gateways are a proven, time-tested methodology for managing API traffic. API gateways can interact with microservices to bring many of the same functions and benefits of the BFF structure.

Full post here, 9 mins read

The two most important challenges with an API gateway when adopting Kubernetes

Encourage a diversity of implementations for consolidated tooling that supports architectural flexibility. However, take advantage of a consolidated underlying platform and offer a ‘buffet’ of implementation options rather than allowing developers to build bespoke ones for better security.
Read more

The two most important challenges with an API gateway when adopting Kubernetes

  • When building an API gateway using a microservices pattern that runs on Kubernetes, you must think about scaling the management of hundreds of services and their associated APIs and ensuring the gateway can support a broad range of microservice architectures, protocols and configurations across all layers of the edge stack.
  • The challenges of managing the edge increase with the number of microservices deployed, which also means an increased number of releases, so it is best that you avoid a centralized approach to operations and let each team manage their services independent of other teams’ schedules.
  • Encourage a diversity of implementations for consolidated tooling that supports architectural flexibility. However, take advantage of a consolidated underlying platform and offer a ‘buffet’ of implementation options rather than allowing developers to build bespoke ones for better security. This is a more manageable and scaleable approach too.

Full post here, 5 mins read

Design patterns in API gateways and microservices

Some of the most common cross-cutting concerns in applications include authentication, authorization, sessions, cookies, cache, logging and dependencies on other services.
Read more

Design patterns in API gateways and microservices

  • Some of the most common cross-cutting concerns in applications include authentication, authorization, sessions, cookies, cache, logging and dependencies on other services.
  • Authentication is best handled by a service that produces either a JSON web token or some other auth token which can be included in subsequent requests. Authorisation should be possible using a token too and should be performed before a request is proxied through to any microservice.
  • Cookies are best avoided by your microservices. If needed, they are easier and cleaner to implement in the gateway.
  • When it comes to caching, start with small expiration times. Maintaining REST-friendly routes will allow for simpler caching at higher levels.
  • Use log aggregation for logging.
  • Each microservice should be as independent as possible, and should not risk cascading failures because one service outage triggers another. Thinking about how the gateway will interface with its microservices is crucial to its robustness.

Full post here, 10 mins read

Microservices architecture as a large-scale refactoring tool

To refactor a monolith into microservices architecture, you need to break it into single responsibilities or services in an incremental fashion
Read more

Microservices architecture as a large-scale refactoring tool

  • If you have a functional monolith, you cannot afford to throw out all its current value and rebuild it from scratch.
  • Instead, you should do a cost/benefit analysis and then rebuild parts of it as microservices accordingly.
  • To refactor a monolith into microservices architecture, you need to break it into single responsibilities or services in an incremental fashion (to limit risk), replacing local function calls with a tool such as a REST operation to make a remote call for a microservice to integrate them; then remove the legacy code, slimming down the monolith.
  • Most of the time, splitting a monolith also means splitting the database it consumes. Ideally, avoid sharing the database among multiple applications to avoid data duplication.

Full post here, 10 mins here

Implementation of a monitoring strategy for products based on microservices

Proper instrumentation of microservices ensures faster pinpointing and troubleshooting of problems. These include metrics for availability, metrics for capacity planning or to detect resource saturation, and metrics to understand internal states of each instance of a microservice.
Read more

Implementation of a monitoring strategy for products based on microservices

  • Proper instrumentation of microservices ensures faster pinpointing and troubleshooting of problems.
  • These include metrics for availability, metrics for capacity planning or to detect resource saturation, and metrics to understand internal states of each instance of a microservice.
  • You need horizontal monitoring to monitor communication between microservices and their availability to each other.
  • Load balancing across instances of microservices depends on several instances of each microservice communicating with several instances of others, so it is useful to have each microservice monitor the quality of its own inbound or outbound calls with other services as well as to have smart gateways in the service mesh inform on traffic entering and leaving it.
  • Logs are the best place to keep metrics for each ETL job and are cheaper than metrics systems that are labeled by job ID.
  • While metrics monitor all events crossing a particular checkpoint over time, traces monitor each event as it travels through the entire microservices chain. Traces are really helpful in monitoring flows in the product.

Full post here, 8 mins here

Kubernetes deployment strategies

The standard for Kubernetes is rolling deployment, replacing pods of previous versions with the new one without cluster downtime. Kubernetes probes new pods for readiness before scaling down old ones, so you can abort deployment without bringing down the cluster.
Read more

Kubernetes deployment strategies

  • The standard for Kubernetes is rolling deployment, replacing pods of previous versions with the new one without cluster downtime. Kubernetes probes new pods for readiness before scaling down old ones, so you can abort deployment without bringing down the cluster.
  • In a recreate deployment, all old pods are killed at once and replaced with new ones.
  • A blue/green or red/black deployment offers both old and new versions together with users having access only to green (old version) while your QA team applies test automation to the blue (new version). Once the blue passes, the service switches over and scales down the green version.
  • Canary deployments are similar to blue/green but use a controlled progressive approach, typically when you want to test new functionality on the backend or with a limited subset of users before a full rollout.
  • Dark deployments or A/B testing are similar to canary deployment but used for front-end rather than backend features.

Full post here, 5 mins here

Common security gotchas in Python and how to avoid them

Prevent input injections (SQL or command injections) by sanitizing input using utilities that come with your web framework, avoid constructing SQL queries manually, and use shlex module to escape input correctly.
Read more

Common security gotchas in Python and how to avoid them

  • Prevent input injections (SQL or command injections) by sanitizing input using utilities that come with your web framework, avoid constructing SQL queries manually, and use shlex module to escape input correctly.
  • Avoid relying on assert statements except when communicating with other developers (such as in unit tests or to guard against incorrect API usage) because in the production environment it is common to run with optimisations and Python will skip the assert statements.
  • Python’s import system is very flexible, and installing third-party packages exposes security holes. You also need to consider the dependencies of your dependencies. So vet your packages: look at PyUp.io, check package signatures, use virtual environments for all apps, and ensure your global site package is as clean as possible.
  • Rather than the very powerful yaml.load, use yaml.safe_load.
  • Python can have overrun or overflow vulnerabilities related to memory allocation, so always patch your runtime, even with the latest version.

Full post here, 7 mins read

5 ways to make HTTP requests in Node.js

Request is a simplified HTTP client which is more user-friendly that you can install as a dependency from npm. It is easy to use and you can support Promises with the request-promise library.
Read more

5 ways to make HTTP requests in Node.js

  • You can use the default HTTP module in the standard library. It saves you the trouble of installing external dependencies but is not as user-friendly as other solutions.
  • Request is a simplified HTTP client which is more user-friendly that you can install as a dependency from npm. It is easy to use and you can support Promises with the request-promise library.
  • Axios is a Promise-based client for both the browser and Node.js, good for asynchronous code and more complex uses. It parses JSON responses by default and can handle multiple concurrent requests with axios.all.
  • SuperAgent, that is primarily used for Ajax requests in the browser, also works in Node.js. It offers functions like query() that you can chain on to requests to add parameters, and as with Axios, you don’t need to parse JSON responses yourself.
  • Got is a more lightweight library compared to Request, etc. Got work with Promises as well.

Full post here, 4 mins read

HTTP headers to secure your app for the busy web developer

Set an X-Frame-Options header to prevent someone from creating an iframe wrapper around your site to clickjack your site. Your safety options are DENY, SAMEORIGIN, and ALLOW-FROM.
Read more

HTTP headers to secure your app for the busy web developer

  • Set an X-Frame-Options header to prevent someone from creating an iframe wrapper around your site to clickjack your site. Your safety options are DENY, SAMEORIGIN, and ALLOW-FROM.
  • You can set X-XSS-Protection to block Reflected XSS (cross-site scripting) attacks.
  • Set the X-Content-Type-Options header to force browsers to respect the server-specified file type, preventing a Javascript injection through an HTML file.
  • Apply Strict Transport Security to refuse to connect as HTTP, enforcing HTTPS instead.
  • Prevent hackers from reading cookies by using HttpOnly to prevent Javascript accessing cookies, blocking an XSS attacker, and by using the Secure attribute to allow cookies to transfer only over HTTPS and not HTTP.

Full post here, 4 mins read

Walking the wire: Mastering the four decisions in microservices architecture

There should be no shared databases. If updates happen only in one microservice, you can use message queues to share data. If updates happen in two services, either merge the two services or use transactions.
Read more

Walking the wire: Mastering the four decisions in microservices architecture

  • There should be no shared databases. If updates happen only in one microservice, you can use message queues to share data. If updates happen in two services, either merge the two services or use transactions.
  • Handle microservice security by using a token-based approach. It pushes the authentication to the client and does access control at microservice level simplifying dependencies.
  • Handle microservice composition by driving flow from client browser, through orchestration or through a centralised server that runs the workflow.
  • Avoid a spaghetti of dependencies by having a proper & detailed plan of how & when which microservices should call each other. Understand the impact of such an invocation on the overall performance of the system.

Full post here, 9 mins read

Unique benefits of using GraphQL in microservices

The data structure in GraphQL allows for well-defined & delineated data ownership for each request. You can have great control over the data loading process and therefore control how data is transferred in a very granular way.
Read more

Unique benefits of using GraphQL in microservices

  • The data structure in GraphQL allows for well-defined & delineated data ownership for each request.
  • You can have great control over the data loading process and therefore control how data is transferred in a very granular way.
  • As GraphQL allows for request bundles composed of many requested resources, it can leverage a sort of parallel execution for data requests. This allows a single request to fulfill the requirements that would otherwise have been through multiple requests to multiple services over multiple servers.
  • GraphQL can budget requests, allows the server to prioritize requests and grant them where appropriate, and reduces timed out requests in the long run.
  • It saves ave processor time and effort because it utilizes Object Identifiers to cache often-requested data.

Full post here, 8 mins read

Do you have too many microservices? - 5 design attributes that can help

When you are developing microservices, ensure that each service relies on its own underlying data stores. If multiple services reference the same table in a DB, there is a great chance that your DB is a source of coupling. You must avoid such coupling.
Read more

Do you have too many microservices? - 5 design attributes that can help

  • When you are developing microservices, ensure that each service relies on its own underlying data stores. If multiple services reference the same table in a DB, there is a great chance that your DB is a source of coupling. You must avoid such coupling.
  • You should try to minimise the number of database tables a microservice uses.
  • At the onset, be clear about whether a service needs to be stateful or stateless.
  • Understand the system-wide relationships of a microservice with other services and what impact does non-availability of a particular microservice will have on the system.
  • Design your service to be the single source of truth for something in your system.

Full post here, 9 mins read

How we implemented domain-driven development in Golang

Some learnings from Grab’s implementation of the principles of domain-driven development (DDD) and idiomatic Go.
Read more

How we implemented domain-driven development in Golang

Some learnings from Grab’s implementation of the principles of domain-driven development (DDD) and idiomatic Go.

  • Work closely with business/domain experts to gather knowledge on the required functionality and flow of what you are building.
  • Break down this knowledge into different problems that need to be solved. Categorize these problems into bounded contexts and subcontexts (read here to know more about it).
  • Use these contexts to identify dependencies in the code & in the team. Identify the building blocks (value objects and entities) to further break down the functionality and flow.
  • Create interfaces to abstract the working logic of a given domain (i.e. repository).
  • Identify domain events and let them provide the relevant useful information to the user in other domains. This enables the independence of different classes.
  • Use a common language developed with the domain experts and apply it consistently in discussions and coding (for instance, to name classes and methods).

Full post here, 5 mins read

How we improved the observability of a Go project

Add comprehensive info-level logging coverage to make each step of processing more observable. Structure logging with log levels using the logrus package to eliminate noise and improve searchability within logs, and impose consistency.
Read more

How we improved the observability of a Go project

  • Add comprehensive info-level logging coverage to make each step of processing more observable. Structure logging with log levels using the logrus package to eliminate noise and improve searchability within logs, and impose consistency.
  • Create a structured error type to include a generic map of data. This will help you diagnose what exactly went wrong at the top level when error handling, indicating which errors are fatal and which indicate you should retry processing.
  • Use Go’s context package, threading it through all request handlers and message processors for greater control over your services, making for a more reliable shutdown in case of new deployment or scaling events.
  • Combine the detailed errors and logging into a common abstraction to reduce code repetition. You can add functions for creating errors and logs with basically the same interface, in turn using the context data.
  • Add service-level indicators (SLIs) for high-level functions in asynchronous processing to reveal end-to-end latency of different kinds of processing to end-users.

Full post here, 6 mins read

5 advanced testing techniques in Go

Use test suites - develop tests written against an interface for all implementations of that interface. Carefully consider interfaces before exporting them and avoid creating a hard dependency between a consumer package and your own.
Read more

5 advanced testing techniques in Go

  • Use test suites - develop tests written against an interface for all implementations of that interface.
  • Carefully consider interfaces before exporting them and avoid creating a hard dependency between a consumer package and your own. To avoid exporting an interface, use an internal/package subtree to keep the interface scoped to the package.
  • Don’t export concurrency primitives, especially channels and the sync package. Also, add documentation on whether a struct or package is safe for concurrent access by multiple goroutines.
  • Use net/http/httptest to speed up tests and run them in parallel more easily, without binding to a port or setting up a server.
  • Use a separate _test package inside the foo/ directory of the package you want to test, rather than the default package pkg. This is a workaround for cyclic dependencies, prevents brittle tests and lets you see what it is like to consume your own package.

Full post here, 8 mins read

Tips to power-up your Java security

Protect against SQL injections by binding variables in prepared statements, using the prepareStatement() function to validate inputs.
Read more

Tips to power-up your Java security

  • Protect against SQL injections by binding variables in prepared statements, using the prepareStatement() function to validate inputs.
  • Returning mutable objects leaves you vulnerable to unexpected changes in your class state. Instead, use an unmodifiable/immutable collection or a copy of a mutable object to return.
  • Avoid including XSS characters in log messages. Manually sanitize each parameter and configure your logger service to replace such characters.
  • Always validate user input, especially when dealing with files whose location might be specified by user input.
  • Replace predictable random values (java.util.Random) based on clock tickets or other predictable parameters with a secure random class and functions.
  • Eliminate dynamic class loading.

Full post here, 4 mins read

Ways to secure your applications

More than 70% of exploited applications are due to outdated dependencies. Ensure dependencies are up to date by using the latest packages and automating dependency management.
Read more

Ways to secure your applications

  • More than 70% of exploited applications are due to outdated dependencies. Ensure dependencies are up to date by using the latest packages and automating dependency management.
  • Explicitly declare acceptable user payloads and use database-level constraints, like maximum column size, refusing null values, etc.
  • Assert safe regular expressions.
  • Limit requests by IP address or user agent.
  • Store credentials outside your codebase, separating application configuration from code.
  • Disable HTTP requests to your server unless very specific use cases demand it. Enable certificate checking for outgoing connections so that communication with third-party APIs or services are also secured by HTTPS.

Full post here, 9 mins read

Top security best practices for Go

You should validate user entries (using native Go packages or 3rd party packages) not only for functionality but also to avoid attackers sending intrusive data.
Read more

Top security best practices for Go

  • You should validate user entries (using native Go packages or 3rd party packages) not only for functionality but also to avoid attackers sending intrusive data.
  • Use HTML templates to cover the vulnerability of XSS. You can use the html/template package to sanitize JavaScript executables.
  • Ensure each database user has limited permissions, that you are validating user inputs and that you are using parameterized queries to protect yourself from SQL injections.
  • Make the best use of Go’s crypto package to encrypt sensitive information.
  • Enforce HTTPS communication and implement in-transit encryption even for internal communication.
  • Remember that error messages and error logs can expose sensitive information. Use the native library in Go for logs or third-party options like logrus, glog or logo.

Full post here, 6 mins read

When DRY fails

DRY (Don’t Repeat Yourself) is about avoiding duplication of effort when writing the code. It also makes sure that when a bug is found it’s fixed across the board. Like many other principles, this one doesn’t work all the time.
Read more

When DRY fails

  • DRY (Don’t Repeat Yourself) is about avoiding duplication of effort when writing the code. It also makes sure that when a bug is found it’s fixed across the board. Like many other principles, this one doesn’t work all the time.
  • If DRY fails, it’s a good practice to reconcile. The simplest way to reconcile is to write a test that reads data A and data B and fails with a diff if they are out of sync.
  • Try having a great test suite that’s ideally 100% coverage and 100% mutation tested that you can run on both pieces of code.
  • Reconciliation does not mean synchronizing, overwriting,  comparing or even being disciplined and manually changing both places.
  • Reconciliation means checking all the data regularly and reporting all discrepancies, including fine-grained discrepancies.

Full post here, 3 mins read

Seven deadly sins of a software project

“Maintainability is the most valuable virtue of modern software development.” Do these seven things to make maintainable software.
Read more

Seven deadly sins of a software project

“Maintainability is the most valuable virtue of modern software development.”

Do these seven things to make maintainable software.

  1. Learn about elegant coding to avoid anti-patterns allowed by languages that are too flexible.
  2. Ensure all changes are traceable (what was changed, by who, and why), by always using a ticket to flag any problem, referencing the ticket in the commit, and preserving its history.
  3. Follow an automated process of testing, packaging and deploying for all releases to execute them from a single command line.
  4. Enforce static analysis rules so no build can pass if any of the rules are violated.
  5. Measure and report test coverage and aim for at least 80% coverage. This coverage metric also lets future developers see if coverage is affected when making changes.
  6. Beware of nonstop development. Always release and version-alize software so future developers can see your intentions and roadmaps from a clear release history (typically in Git tags and release notes), and have each version available for download.
  7. Ensure user interfaces are carefully documented to let the end-user see what the software does. 


Full post here, 7 mins read

Things you should never do

There is always a temptation to code from scratch rather than improve existing code because you may get a lot of excitement in building something grand. It is also harder to read code than to write it.
Read more

Things you should never do

  • There is always a temptation to code from scratch rather than improve existing code because you may get a lot of excitement in building something grand. It is also harder to read code than to write it.
  • You may assume the difficulty of reading and understanding how the old code works implies that you can write it better. This assumption is usually wrong because the old code worked and was tested and repeatedly debugged. New code will not automatically be better than old. What makes old code look messy is the patches.
  • And there are things you can try to improve in messy code. Work on improving architectural problems that require refactoring, moving code around, better defining base classes and sharpening interfaces carefully and work on smaller inefficient parts of a project which you can speed up.
  • Rewriting code from scratch is one of the worst mistakes companies/developers can make and should avoid

Full post here, 6 mins read

Top 5 cybersecurity predictions for 2020

Credential stuffing, where hackers steal login credentials from one site and use the same credentials to break into a user’s accounts on other sites, will continue to be an easy attack.
Read more

Top 5 cybersecurity predictions for 2020

  • Credential stuffing, where hackers steal login credentials from one site and use the same credentials to break into a user’s accounts on other sites, will continue to be an easy attack.
  • AI-focused detection products will lose the hype because of their inability to meet promises.
  • California Consumer Protection Act (CCPA) will have a big impact on many tech companies with regard to their data privacy practices.
  • Cybersecurity breaches for autonomous vehicles will increase because of systems not keeping pace with advancing threats in this area.
  • You will be required to do the operational work of assigning ownership & accountability in your companies to ensure data laws, regulations, norms and best practices are in place to improve cybersecurity.

Full post here, 4 mins read

Ways to hack an API and how to defend

Use base-level encryption to allow functionality to operate as expected but obscure relationships between data to defend against reverse engineering. To defend against spoofing you can encrypt all traffic in transit.
Read more

Ways to hack an API and how to defend

  • Use base-level encryption to allow functionality to operate as expected but obscure relationships between data to defend against reverse engineering.
  • To defend against spoofing you can encrypt all traffic in transit. This will ensure that what is captured is only “noise”. Another option is to set up a pre-configured server certificate that is trusted by the API and allowing a handshake to go through only if when the certificate passes. You could also try a two-factor authentication to prevent attacks from the user perspective.
  • Ensure proper session management. Be sure that sessions are invalidated once users get past an idle timeout period or if the user logs out. You should set the session lifespan to terminate at a certain point.
  • Enforce API level security by using opt-in heuristic systems to know when a user is coming from an unknown machine, unknown location, or if there is any other variation in a known behavior.

Full post here, 11 mins read

Security best practices for MongoDB

Configure Transport Layer Security to encrypt all traffic to and from the database. Use at rest encryption to protect the contents of the DB in the event that someone is able to copy the database files (in a backup, for instance) or the server image.
Read more

Security best practices for MongoDB

  • MongoDB doesn’t have access control enabled by default. You must enable it. Also, configure RBAC (role-based access control).
  • Configure Transport Layer Security to encrypt all traffic to and from the database.
  • Use at rest encryption to protect the contents of the DB in the event that someone is able to copy the database files (in a backup, for instance) or the server image.
  • Restrict network exposure to tighten the security of the network topology that hosts the MongoDB database.
  • Use official MongoDB package repositories. Ensure that the packages are official MongoDB packages and pass the authenticity checks.
  • Disable JavaScript execution where possible. Troublesome operators - $where, mapReduce, and group - can be incredibly dangerous.

Full post here, 7 mins read

4 serverless myths to understand before getting started with AWS

One myth is that serverless implies Functions as a Service (FaaS). Cloud services are serverless if no servers are exposed for you to administer, if they scale automatically and you pay for what you use only.
Read more

4 serverless myths to understand before getting started with AWS

  • One myth is that serverless implies Functions as a Service (FaaS). Cloud services are serverless if no servers are exposed for you to administer, if they scale automatically and you pay for what you use only. In fact, serverless need not mean web-based apps, and can include real-time analytics and processing, so look beyond functions.
  • Don’t think that serverless is a silver bullet. Serverless technology is best suited for event-based architectures, rather than traditional client-server architecture, and you need to beware of recreating monolithic structures.
  • Another common myth is that serverless means an end to operational burdens. Advanced observability is intrinsic, so you need operational effort to monitor, maintain and effectively scale, though you need not administer servers.
  • Don’t believe that serverless is infinitely scalable. Serverless services have high availability but cannot scale infinitely - each service has limits, such as lambda’s memory limits and Kinesis’ throughput limits - so you need to optimize for the limits and plan for failure scenarios to ensure resilience.

Full post here, 6 mins read

Production secret management at Airbnb

Airbnb built an internal tool Bagpiper which is a collection of tools and framework components that it uses for the management of production secret assets. They designed it to decouple secret management from other app configurations as Airbnb scaled, and to ensure a least-privileged access pattern
Read more

Production secret management at Airbnb

  • Airbnb built an internal tool Bagpiper which is a collection of tools and framework components that it uses for the management of production secret assets.
  • They designed it to decouple secret management from other app configurations as Airbnb scaled, and to ensure a least-privileged access pattern, encryption of secrets at rest, support for applications across several languages and environments, and managing secrets for periodic rotation.
  • Bagpiper creates segmented access by asymmetrically encrypting secrets with service-specific keys: a secret is encrypted with each of the public keys on a per-secret keychain, and only services with the corresponding private keys can decrypt the secret. It encrypts information at rest and decrypts it during use.
  • Engineers can add, remove and rotate secrets, and make them available to select production systems. Secrets and changes to code are typically deployed together.
  • Secrets are rotated continuously, using secret annotations that specify when a secret was created/last rotated and when to rotate it again.

Full post here, 6 mins read

Want to debug latency?

Latency is a critical measure to determine whether our systems are running normally or not. There are many collections libraries available that help you collect latency metrics.
Read more

Want to debug latency?

  • Latency is a critical measure to determine whether our systems are running normally or not.
  • There are many collections libraries available that help you collect latency metrics.
  • Heat maps are useful as they help visualize latency distribution over time.
  • After narrowing down the source of the latency to a service or process, look at the host-specific and in-process reasons why latency occurred in the first place.
  • If the host is behaving normally and networking is not impacted, go and further analyze the in-process sources of latency.
  • Some language runtimes like Go allows us to internally trace runtime events in the lifetime of a request.

Full post here, 6 mins read

Serverless for startups - it’s the fastest way to build your technology idea

Scaling is handled for you. You don’t have to worry whether a function is run once a day or a million times a day. Startups often need to change system concept and functionality mid-way and the agility serverless structure offers is perfect for this use case.
Read more

Serverless for startups - it’s the fastest way to build your technology idea

  • With serverless, you pay for what you use - the hidden infrastructural support for scaling is also built into the final bills.
  • Scaling is handled for you. You don’t have to worry whether a function is run once a day or a million times a day.
  • Startups often need to change system concept and functionality mid-way and the agility serverless structure offers is perfect for this use case.
  • It allows startups to dedicate all the engineers to solve business problems and not spend time on server management & infrastructure.
  • Serverless gives startups a chance to deliver quickly, and use speed and agility to their advantage.

Full post here, 7 mins read

Serverless deployment best practices

When you create IAM policies for your services, limit the roles to the minimum permissions required to operate.
Read more

Serverless deployment best practices

  • Keep your secret out of your source control and limit access to them. Use separate secrets for different application stages when appropriate.
  • When you create IAM policies for your services, limit the roles to the minimum permissions required to operate.
  • Restrict deploy times by locking down your deployments during periods you don’t want to be disturbed.
  • Use a single region or a subset of regions that suit your needs to offset inconsistencies with a geographically distributed team.
  • Create consistent service names for your Lambda functions. It will help you to find relevant functions easily and to tie multiple functions with a particular service faster.

Full post here, 6 mins read

Tips & tricks for developing a serverless cloud app

Focus on limiting the scope of your functions. Protect your code from malfunctioning by setting up a queue, or buffer requests..
Read more

Tips & tricks for developing a serverless cloud app

  • When going serverless, focus on limiting the scope of your functions.
  • Communication between your functions is important to exchange data within your app. You can either directly call another Lambda function from within a Lambda function or upload data to a service and let this service trigger another Lambda function.
  • Protect your code from malfunctioning by setting up a queue, or buffer requests if necessary for uniform scalability when working with numerous services.
  • Your functions have 15 minutes to run before they time out. So, the execution time for your app should be under that timeframe.
  • The more memory you allocate, the more CPU power you have. The same is true for network and I/O throughput.

Full post here, 5 mins read

Stretching, executing, coasting - and pacing yourself to avoid burnout

Look at how professional athletes build their careers. They pace themselves to optimize performance and ensure the longevity of their careers.
Read more

Stretching, executing, coasting - and pacing yourself to avoid burnout

  • Look at how professional athletes build their careers. They pace themselves to optimize performance and ensure the longevity of their careers. We, software developer can (and should) do the same too - use the model of stretching, executing and coasting just as they do.
  • Stretching is the most fun mode where you learn things quickly, apply them as you go, step up to new challenges, and move out of your comfort zone to accelerate learning. However, if you stretch too long and you will slow down or burn out.
  • Executing is the normal way of working where you use your existing skills and experience to get things done well, without continuously stretching. To get your manager’s support on this mode, list additional things you do and establish your intention to delegate or say no.
  • Coasting implies doing less or lower-quality work than you are capable of, say, as a short breather after a big project or because of personal circumstances.

Full post here, 5 mins read

7 things you don’t know about agile architecture

Build in feedback loops in every development cycle. Don’t mistake fast initial development for sustainable agility.
Read more

7 things you don’t know about agile architecture

  • Design the project so that introducing changes is not expensive.
  • Don’t spend too much time designing. Start building, learn and progress. Build in feedback loops in every development cycle.
  • Don’t mistake fast initial development for sustainable agility. You want to arrive sooner at the right end product rather than simply going faster.
  • Work in smaller teams, as bigger ones are less flexible and need more communication making them less agile. Know that more people does not guarantee earlier completion.
  • Avoid using speculation on future requirements to add complexity to projects. However, past changes can be clues to future needs, so watch for change hotspots and high defect density.

Full post here, 6 mins read

Programming: Doing it more vs doing it better

Learn to review and test more thoroughly and refactor sooner than later. Quit the struggle of trying to get faster at engineering.
Read more

Programming: Doing it more vs doing it better

  • The best way to get better at writing software is to write more software, not to seek perfection at every shot.
  • Put more thought into your design systems - strive to write readable, maintainable code without bugs.
  • Don’t assume that you will one day churn out beautiful code effortlessly. Instead, learn to review and test more thoroughly and refactor sooner than later.
  • Quit the struggle of trying to get faster at engineering. Taking your time to think and revising as you go helps you write better code.
  • However, realize it is impossible to do the last while meeting objectives; instead, apply diligence over the long-term.

Full post here, 4 mins read

Ways to stay motivated while learning to code

Aim for small incremental improvements. Look at the bigger picture of what you're enabling.
Read more

Ways to stay motivated while learning to code

  • You are going to spend a lot of time finding & fixing bugs. When you solve a problem completely, treat yourself.
  • Don’t learn to code - code to learn. Aim for small incremental improvements.
  • Amplify the positive, not negative. Don’t give in to the impostor syndrome. Learn to keep moving in spite of it.
  • Always look at the bigger picture of what you're enabling. Find and frame the reasons why you love your job.
  • Develop a hobby that gets your blood flowing, literally. Physical exercise helps a lot in staying motivated and focused!

Full post here, 14 mins read

Developers mentoring other developers: practices I've seen work well

Draw upon your own experiences, contexts and perspectives rather than giving textbook-ish suggestions.
Read more

Developers mentoring other developers: practices I've seen work well

  • When mentoring someone, you should delay sharing solutions to problems your mentee is facing as long as possible. Prompt them to introspect and figure out solutions on their own.
  • As they say, the devil is in the details. Ask specific questions to dive deeper and seek context. Listen intently and understand the underlying subtext.
  • Draw upon your own experiences, contexts and perspectives rather than giving textbook-ish suggestions.
  • Always be supportive and let the mentee know that you are on their side.

Full post here, 18 mins read

Is there a future beyond writing great code?

Work towards going into engineering management or look for pro-bono tech work for a social cause that’s close to your heart.
Read more

Is there a future beyond writing great code?

There are many things we developers can do that will make us more fulfilled in our professional lives. There are immediate options to explore beyond just writing code. You can work towards going into engineering management and be a great engineering leader. To move forward in life, one has to give back. And some ways of giving back could be as simple as code reviews, workshops, and individual assessments with some colleagues. Go do it and be a mentor to fellow developers. You will learn a lot of new things on the way too. If meaning is what you are looking for, why not consider doing pro-bono tech work for a social cause that’s close to your heart. There are always ways to get more fulfillment from our skillsets and our careers than we think. We just need to explore.

Full post here, 9 mins read

The headers we don’t want

Some unnecessary HTTP headers you want to avoid. Vanity headers such as server, x-powered-by and via. Some headers, such as p3p, expires, and x-frame-options represent deprecated standards.
Read more

The headers we don’t want

Some unnecessary HTTP headers you want to avoid:

  • Vanity headers such as server, x-powered-by and via offer little value to end-users or developers but at worst they divulge sensitive information.
  • Some headers, such as p3p, expires, x-frame-options and x-ua-compatible, represent deprecated standards.
  • Headers that are only useful to debug data but are not recognized by any browser, such as x-cache, x-request-id, x-aspnet-version, x-amzn-requestID. As a developer, you may want to keep them on but know that removing them makes no difference to how your pages are rendered.
  • x-robots-tag is a non-browser header only useful when the requesting agent is a crawler.

Full post here, 7 mins read

Principles of dependency injection

Use abstractions and make your code be reusable and flexible. Follow the single responsibility principle and let each class do only one thing.
Read more

Principles of dependency injection

  • Dependency injection can make your code more maintainable, using abstractions and by decoupling things.
  • If you code to implementation, you will get a tightly coupled and inflexible system. Use abstractions and make your code be reusable and flexible.
  • Follow the single responsibility principle and let each class do only one thing.
  • Differentiate between creatables (which should be created within the constructor if the whole class uses them) and injectables (which should be asked for by the constructor and not created directly).
  • Your constructors should only check for null, create creatables and store dependencies for later use. They should be free of any coding logic.

Full post here, 4 mins read

These four ‘clean code’ tips will dramatically improve your engineering team’s productivity

‘If it isn’t tested, it’s broken’, so write lots of tests, especially unit tests. Code not covered by a test gets invisibly broken until customers spot the bug.
Read more

These four ‘clean code’ tips will dramatically improve your engineering team’s productivity

Top strategies based on Robert Martin’s Clean Code:

  • ‘If it isn’t tested, it’s broken’, so write lots of tests, especially unit tests. Code not covered by a test gets invisibly broken until customers spot the bug.
  • Choose meaningful, short, precise names for variables, classes and functions and make it easy to find files by name. In case of conflict, choose precision over brevity.
  • Keep classes and functions small - four lines per function and 100 lines per class at most - and make them obey the single responsibility principle. This will help with documenting code better as you will have lots of well-named sub-functions.
  • Ensure functions have no side effects (such as modifying an input argument), and specify this explicitly in the function contracts if possible (such as passing in native types or objects that have no setters).

Full post here, 7 mins read

I didn't want to be a manager anymore - and the world didn't end

Understand the aspects of the new role that you know little about and figure out how you can get more exposure to them.
Read more

I didn't want to be a manager anymore - and the world didn't end

Here are a few key lessons I learned from this post:

  • If you’re interested in a new role, make sure you tell the right people about it in 1:1 conversations. Ask for advice and mentorship for the skills required for the role you want to be in.
  • Get to know yourself and what you like best. Learn as much as you can about aspects other than engineering, like product, design, compliance, support, sales, technical writing, etc.
  • Think about what you like & dislike in your current role and how those aspects would change if you moved to a different one.
  • Understand the aspects of the new role that you know little about and figure out how you can get more exposure to them.

Full post here, 9 mins read

Programming: Math or Writing?

The short answer is both. Just like when solving a math problem, you need to decompose a programming problem into smaller problems.
Read more

Programming: Math or Writing?

The short answer is both. The longer answer is:

  • Just like when solving a math problem, you need to decompose a programming problem into smaller problems.
  • The presence of functions, binary & hexadecimal numbers, Boolean logic, and big O notation for analyzing algorithm performance are either actual math or very close to it.
  • Abstractions are useful not only when constructing programs but also when reasoning about programs. Reasoning precisely about abstractions takes a nod from maths.
  • Your code needs to communicate the structure and organization of the program to other programmers. You need a good overall structure and you need to divide your programs into smaller snippets, just like paragraphs, for better readability.
  • Programming is a repetitive process of coding with frequent refactoring/revising, much like editing to revise and improve the written text.

Full post here, 4 mins read

What I learned about making software from my 3-year old daughter

If we focus, we might discover patterns and principles that can help us solve problems.
Read more

What I learned about making software from my 3-year old daughter

  • A meta-learning - observing children can teach us a lot.
  • When trying a new framework, tool or language, we need to play around and ask for help when we get stuck. Exploring is important.
  • When her toys get broken, we use the glue to fix them but once broken, they have a tendency of breaking again after some days. Even if they stay intact, she knows it is broken and at the risk of breaking again. Some code, like broken toys, cannot be repaired and the part or functionality needs to be completely replaced or rewritten.
  • She likes to play a game of spotting patterns. We tried to spot heart patterns one day. She was focused on finding those patterns everywhere and she won, of course. If we focus, we might discover patterns and principles that can help us solve problems. There are always patterns that we miss because we are not really looking for them.

Full post here, 4 mins read

Continuous testing of APIs

3 steps for having your APIs tested continuously: Write good test collection. Run tests on schedule and on-demand. Look at analytics & set up smart alerts.
Read more

Continuous testing of APIs

  • 3 steps for having your APIs tested continuously: Write good test collection. Run tests on schedule and on-demand. Look at analytics & set up smart alerts.
  • You should be running contract tests, integration tests and end-to-end tests in your build system on demand - when code changes happen or code merges happen.
  • You should have some scheduled tests run regularly. These are the ones for API health checks, DNS checks, security checks, and any infrastructure related checks.
  • For complete test coverage of your APIs, you will need both scheduled and on-demand tests.
  • Analytics from data generated from these tests will give you a view of system health, performance, stability, resiliency, quality and agility over time. Use it to find underlying problems and set up effective alerts.

Full post here, 7 mins read

Common API mistakes and how to avoid them

Be stingy with data you are sending through your APIs. Try to name attributes of objects in your API responses in such a way that they can be forward compatible with any future updates.
Read more

Common API mistakes and how to avoid them

Covering the “how to avoid” part here

  • Be stingy with data you are sending through your APIs. Figure out what’s the absolute minimum amount of data that satisfies the requirements you are trying to meet.
  • Represent upstream data internally as a Domain Object. You can both circumvent some bugs and provide a more consistent API by doing this.
  • Try to name attributes of objects in your API responses in such a way that they can be forward compatible with any future updates.
  • Apply Robustness Principle: “Be conservative in what you do, be liberal in what you accept from others.” Ensure all the API responses follow conventions and best practices but be accepting of inconsistent forms of requests (whenever you can) and normalize them into a consistent format at your end.

Full post here, 15 mins read

API practices if you hate your customers

Practices that make API experience bad for developers.
Read more

API practices if you hate your customers

Full post here, 16 mins read

Estimation 101 – a short guide

We all commit to deadlines based on our estimations and find ourselves in a tight spot later. For good estimations, break tour work down into a set of components, think about their individual complexity to understand the whole project.
Read more

Estimation 101 – a short guide

  • Estimations are hard and turn up inaccurate most of the time. We all commit to deadlines based on our estimations and find ourselves in a tight spot later.
  • For good estimations, break tour work down into a set of components, think about their individual complexity to understand the whole project. This can help you give more accurate estimates.
  • A PERT estimate is a good way to get to and also to share estimates. In this, you look at three values: pessimistic, optimistic, and most likely estimate. PERT estimate = (Optimistic + 4*Most likely + Pessimistic)/6. It is a weighted average of the three estimates.
  • When you are estimating your time or effort for projects, include time for writing tests, quality assurance, release scripts. Also include time for technical documentation and provisioning scripts for the cloud infrastructure to support the project, if any are required.

Full post here, 6 mins read

Understanding the hidden powers of curl

You can do many requests to any number of API endpoints on a single CLI line. It offers a great amount of flexibility with methodologies and protocols that you can use.
Read more

Understanding the hidden powers of curl

  • You can do many requests to any number of API endpoints on a single CLI line. You can also set a specific order to requests using --next.
  • You can pass the -v flag and generate a verbose record of the interaction that curl is doing. You can also output everything that occurs in curl using the --trace-ascii function. It will give an ASCII output for parsing and viewing.
  • curl offers a great amount of flexibility with methodologies and protocols that you can use.
  • It has many good options to compile your curl content and mimick curl activities in general. You can mimic the activity of a known browser by leveraging the “copy as curl” option available on Chrome, Firefox, Safari, etc.

Full post here, 8 mins read

Center stage: Best practices for staging environments

You should match staging & production as closely as possible. Use the same load balancers, security group settings, and deployment tooling.
Read more

Center stage: Best practices for staging environments

  • Staging environments help you validate the known-unknowns of your systems. Having good tests is not an alternative to staging environments. You must invest time & effort in a proper staging rollout across the company.
  • Your entire engineering team should have a consistent & homogenous deployment pipeline and runtime platform. This consistency will help with disaster recovery if and when it happens.
  • You should match staging & production as closely as possible. Use the same load balancers, security group settings, and deployment tooling.
  • Use a tool that allows teams to roll back the last deployment. No one’s work should be blocked by someone else’s bug.
  • Try to replicate production traffic loads & patterns as much as possible ion staging environments. The staging environment should be at proportionally the same scale as production. Avoid under-provisioning.

Full post here, 12 mins read

Stop writing crap code

Stop using else statements as your defaults. Make your code more descriptive. Use built-in functionality.
Read more

Stop writing crap code

  • Stop using else statements as your defaults. As code grows, logic gets complicated and these else statements come to bite you back while debugging code 6-12 months later.
  • Take the time to find & use built-in functionality of the language you are using. It will save your code from bloating.
  • Each function should have just one job to be done. Don’t hide your logic inside functions.
  • Make your code more descriptive by naming things properly, by adding code comments, etc.

Full post here, 4 mins read

Good code reviews, better code reviews

Questioning the necessity & impact of code changes in the context of your overall system. Look at abstractions introduced and aim for doing a contextual pass.
Read more

Good code reviews, better code reviews

  • You can make code reviews better by questioning the maintainability, necessity & impact of code changes in the context of your overall system. Look at abstractions introduced and aim for doing a contextual pass.
  • Any good code review avoids opinionated comments/statements. You can make code reviews better by focusing on being empathetic and all-round positive, kind and unassuming.
  • Good reviewers leave as many comments and questions as are needed, and prefer connecting with the author in-person, while better reviewers will proactively reach out to the person to avoid any misunderstandings in the first place.
  • You can have better coder reviews by looking beyond the errors and trying to figure out the underlying issue.
  • Companies with good code reviews ensure that everyone takes part in the code review process. Companies with better code reviews ensure that code just doesn’t make it to production without reviews.

Full post here, 8 mins read

Code less, engineer more

Build only what you absolutely must. Identify vendors for pieces you can more efficiently buy off the shelf than to build in-house. Focus on the impact of code and not the volume of code written.
Read more

Code less, engineer more

“…But just as we wouldn’t insist that every bridge be built with bespoke girders and bolts, or that all electrical plugs and sockets have their own form factors, we shouldn’t insist on custom-building every part of the designs that we craft.”
  • Build only what you absolutely must. Identify vendors for pieces you can more efficiently buy off the shelf than to build in-house. Document the rationales of your decisions for future use.
  • Encourage the mindset of focusing on the impact of code and not the volume of code written in your team.
  • If and when you write a completely new component, share it with everyone in the team and the company. While sharing write about what you made, why you made it and how you did it. This helps in avoiding duplication of efforts but another teammate or another team in the company.
  • In your team, encourage & reward reusing code.

Full post here, 6 mins read

What’s the difference between versioning and revisioning APIs?

Versioning implies that each group of related changes in an API is presented under a specific number, often denoting the type of release. Revisioning implies incremental changes have been made and it prevents version-to-version code breaks,
Read more

What’s the difference between versioning and revisioning APIs?

  • Look at release management as a communication tool - to and for your API consumers. You can take two approaches to it - versioning and revisioning.
  • Versioning implies that each group of related changes in an API is presented under a specific number, often denoting the type of release.
  • Revisioning implies incremental changes have been made and it prevents version-to-version code breaks, allowing legacy code to continue functioning.
  • You will find versioning useful when your API is flexible enough to support version 1 as legacy while transitioning to version 2; or when your API is only useful as a conduit to another API or system.
  • Revisioning is a more apt approach if your API supports vital business functions, medical purposes, security systems, etc., where it is essential for the API to stay up and running at all times.

Full post here, 7 mins read

4 ways your API specification can fall short and what to do about it

Your spec should clearly state the size constraint for each response developers should keep in mind while coding.
Read more

4 ways your API specification can fall short and what to do about it

  • You should communicate clearly about the number of items and pages developers can expect from each endpoint so that developers don’t make assumptions about it.
  • Your spec should clearly state the size constraint for each response developers should keep in mind while coding.
  • Your spec must set a performance standard and stick with it because developers code for their applications using your APIs after considering how fast/slow your APIs return requests.
  • Cover authorization extensively in your specs. Encourage the use of refresh tokens throughout your documentation.

Full post here, 5 mins read

Best design practices to get the most out of your API

Make your API easy to understand and fast to start up. Aim for intuitive consistency with repeating patterns and conventions
Read more

Best design practices to get the most out of your API

  • You should design your API in a way that it does one key thing really well. Don’t try to add too many what-if scenarios.
  • Make your API easy to understand and fast to start up.
  • Aim for intuitive consistency with repeating patterns and conventions in endpoint names, input parameters & output responses so that developers can begin without reading the documentation.
  • But still, create great documentation. And let developers try your API without the need to log in or sign up.
  • Make troubleshooting easier by returning meaningful error messages.
  • Make your API extensible with primitives that enable new workflows. Include an opportunity for feedback from top partners - perhaps allow few developers to test-drive beta options.

Full post here, 7 mins read

Go: best practices for production environments

Use a single, global GOPATH for your development environments. Try cloning your repos into their canonical paths within the GOPATH, and work there directly.
Read more

Go: best practices for production environments

  • Use a single, global GOPATH for your development environments. Try cloning your repos into their canonical paths within the GOPATH, and work there directly.
  • For repository structures, a good practice is to limit the number of source files. Your repos (with the exception of a vendor subdirectory) shouldn’t contain a directory named src, or represent their own GOPATH.
  • When it comes to passing configuration, package flag provides the best value and has strict typing and simple semantics.
  • Formatted code can significantly increase clarity. Use gofmt to format your code.
  • For logging and telemetry, try using package log that implements a simple logging package. It defines a type, logger, with methods for formatting output.

Full post here, 11 mins read

How not to store passwords

One of the good options for storing passwords is key derivation functions. They require more compute time to get cracked which means an attacker needs to spend more money to crack them.
Read more

How not to store passwords

  • It can’t be said enough - do not save passwords in plain text.
  • Encryption is only slightly better than plain text. It is not THE answer for sure.
  • Plain hashes are pretty weak too. They are vulnerable because users tend to replicate the same passwords for different websites and they also use very simple passwords making it easy to crack.
  • Salted hashes are much better at protecting passwords. But the speed at which hashes can be calculated by attackers makes brute-force attacks reasonably possible.
  • One of the good options for storing passwords is key derivation functions. They require more compute time to get cracked which means an attacker needs to spend more money to crack them.

Full post here, 7 mins read

Four load testing mistakes developers love to make

Being too focused on what you set out to test and ignoring any other warning signs while testing is a common mistake developers make. Reusing test data is another common mistake.
Read more

Four load testing mistakes developers love to make

  • If you run a short load test and it works fine, it is no guarantee that your service can handle that load for a long time. You should run your performance tests for more time and understand your system’s performance characteristics.
  • Being too focused on what you set out to test and ignoring any other warning signs while testing is a common mistake developers make. It’s good to pay attention and investigate unusual results or changes in your application’s behaviour as you increase load.
  • Reusing test data is another common mistake. You should either generate new data or spin up a new environment for each test.
  • Assuming the production environment is perfectly healthy permanently. Deliberately make things to go wrong during your load test to find out how your service will perform if such failures happen.

Full post here, 7 mins read

Be as serverless as you can, but not more than that

Once you find what does deliver value, you can consider going partly serverless and investing more in the infrastructure for it.
Read more

Be as serverless as you can, but not more than that

  • As a startup that’s exploring what may work, you want to rapidly provide options to your customers and see how they respond without investing too much. Serverless architecture helps with this.
  • Once you find what does deliver value, you can consider going partly serverless and investing more in the infrastructure for it.
  • Ask how much of your stack you need to own to deliver business value and differentiation.
  • Consider whether it makes sense to outsource SLA, regulatory compliance, pricing and roadmaps to a service provider.
  • Optimize for cost and latency as you start to find predictable patterns for your new product or service.

Full post here, 5 mins read

Security traps to avoid when migrating from a monolith to microservices

Rollback to the last known good state after a failure is more complex with microservices, so program in reverts carefully for data integrity.
Read more

Security traps to avoid when migrating from a monolith to microservices

  • Rollback to the last known good state after a failure is more complex with microservices, so program in reverts carefully for data integrity.
  • Move as many of your microservices as you can off the public networks to protect against DDoS attacks and other malicious actors.
  • Never pass data between services in plain text. Always encrypt.
  • Add monitoring to each service separately.
  • Develop a logging approach to follow for all teams, each service, consistently.
  • Don’t provide individual services too much access. Limit access intelligently based on need only.

Full post here, 7 mins read

Blue-green deployment: a microservices antipattern

Blue-green deployment is a technique that reduces downtime for your application by running two identical production environments called Blue & Green.
Read more

Blue-green deployment: a microservices antipattern

  • Blue-green deployment is a technique that reduces downtime for your application by running two identical production environments called Blue & Green.
  • At any time, one of the environments is live and one is idle. The live environment serves all production traffic. And your team deploys and tests in the idle environment. Once the new build runs fine, you switch the router to swap the live & idle environments.
  • Adopting this approach in case of microservices, tosses out the need for microservices to be independently deployable.
  • All microservices in a release need to be mutually compatible because the entire application is released in one go in the new environment.
  • “..this creates a distributed monolith whose pace of evolution is limited by the slowest-developing microservice.”

Full post here, 4 mins read

All about prefetching

Four common prefetching strategies - interaction-driven, state-driven, user-driven and download everything.
Read more

All about prefetching

  • Network Information APIs allow you to use different strategies for different connection types to improve prefetching performance.
  • Four common prefetching strategies are as follows. Many websites use a combination of these.

1. Interaction-driven - uses mouse and scroll activities as signals

2. State-driven - uses current page or URL to prefetch next logical step

3. User-driven - matches each specific user’s patterns of past usage, account information, etc

4. Download everything - so all the links in a page or all the bundles for an app.

  • The prefetch resource hint can be used to prefetch on all major browsers except Safari. You can either manually add prefetch resource hints with no dependencies or build tools based on developer configurations or use dedicated prefetch tools such as quicklink, Guess.js or Gatsby.

Full post here, 7 mins read

The value in Go’s simplicity

Super strong forward compatibility with careful attention to versioning and dependency. Great restraint to add ‘good-to-haves’ versus what you really need.
Read more

The value in Go’s simplicity

  • Super strong forward compatibility with careful attention to versioning and dependency. Libraries are super stable, and you can have very few external dependencies.
  • Great restraint to add ‘good-to-haves’ versus what you really need. So you get only two generic(ish) data structures of arrays (slices) and dictionaries (maps).
  • It comes with almost everything you need: the basic go test framework for testing; a sync package that covers most sync primitives that you may need; and an encoding package that works with json, xml, csv and other common formats.
  • The internal formatting tool gofmt provides enough consistency that your code looks idiomatic. This makes open-source code a lot more readable too.

Full post here, 5 mins read

Databases that play nice with your serverless backend

Modern cloud-native DBs, that expose stateless REST APIs, work best for serverless computing.
Read more

Databases that play nice with your serverless backend

  • Modern cloud-native DBs, that expose stateless REST APIs, work best for serverless computing.
  • Amazon’s Aurora Serverless variant scales to demand, resizes to meet storage capacity and handles routine maintenance as a managed service. Its Data API feature works with SQL DBs.
  • With Amazon DynamoDB, designing your data to work it while supporting access patterns is hard. Complex queries can be tricky too.
  • Microsoft’s Azure Cosmos works with several APIs & supports multiple data models. It lets you choose the exact tradeoffs between performance & consistency.
  • Google’s Cloud Firestore works well for web and mobile apps. It has built-in security without requiring a backend and it syncs across clients in realtime.

Full post here, 7 mins read

How to be a rock star developer

Write utility code that can be used by all. Focus on your work integrity, adaptability and the desire to do excellent work.
Read more

How to be a rock star developer

  • Write utility code that can be used by all and collaborate with other developers on projects beyond your scope of work.
  • Don’t give in to FOMO. Spend time doing the essential work that gets you real results.
  • At crunch times, rise to the occasion with a leadership mindset to rally, organize and collaborate.
  • Try to be essential but not indispensable. Indispensability eventually leads to burnout.
  • Focus on your work integrity, adaptability and the desire to do excellent work.

Full post here, 5 mins read

Become a better developer by mastering the superpower of deep work

Deep work is the ability to focus without distraction on a cognitively demanding task, to produce results in less time. To develop a good deep work habit, add routines to your work life.
Read more

Become a better developer by mastering the superpower of deep work

  • Deep work is the ability to focus without distraction on a cognitively demanding task, to produce results in less time. To develop a good deep work habit, add routines to your work life.
  • Check your agenda for important meetings and plan your day’s deep work sessions, breaks, and shallow work.
  • Arrive at work well before colleagues, when there is no noise, no meetings & no demands that require context switching.
  • Once your teammates arrive, take a social break or do some shallow work - answer emails, check feeds, engage in the daily scrum. Then tackle another deep work session.
  • Create a shutdown ritual to end the day - answer important emails, update task statuses and do small tasks that help you prepare for tomorrow.
  • Once you leave, practice stopping to think about work. If you can, turn off work-related notifications.

Full post here, 15 mins read

5 developer environment hacks to increase productivity

Use iTerm2 or Bash to know which Git branch you are working on and its status. Use Tmux to manage multiple window splits with fast shortcuts and hot-keys.
Read more

5 developer environment hacks to increase productivity

  • Use iTerm2 or Bash to know which Git branch you are working on and its status.
  • Use Tmux to manage multiple window splits with fast shortcuts and hot-keys. The screen function also lets you suspend sessions and return to them later.
  • Shell Aliases are handy to remember the syntax for tedious commands and long-winded texts.
  • Organising code directories helps differentiate between cloned repositories and something you might be tinkering with locally.
  • Ripgrep is way faster than git-grep and can blaze through lines and lines of code to find what you’re looking for quickly.

Full post here, 4 mins read

Autoscaling AWS Step Functions activities

Transactional flows are an ideal use case for auto-scaling because of unused compute capacity during non-peak hours.
Read more

Autoscaling AWS Step Functions activities

  • Transactional flows are an ideal use case for auto-scaling because of unused compute capacity during non-peak hours.
  • When you need to detect any scaling-worthy events, AWS components like Step Functions Metrics and Cloudwatch Alarms come in handy.
  • Support a scale-down cool-off time to prevent two consecutive scale-down actions within a certain amount of time.
  • Guard your system against any malicious, delayed, or duplicated scaling notifications by validating incoming scaling signals.
  • Review historical statistics for scale-down alarms so that they’re less susceptible to triggers and never occur during peak hours.
  • For a safe rollout, increment steps till you gradually reach the ideal minimal instance count.

Full post here, 5 mins read

When AWS autoscale doesn't

If the ratio of your actual metric value to target metric value is low, the maximum magnitude of a scale out event will get significantly limited.
Read more

When AWS autoscale doesn't

  • Scaling speed is limited. If the ratio of your actual metric value to target metric value is low, the maximum magnitude of a scale out event will get significantly limited.
  • Short cooldown periods can cause over-scaling or under-scaling because a scaling event may trigger before a previous scaling event has concluded.
  • Target tracking autoscaling works best in situations where at least one ECS service or CloudWatch metric is directly affected by the running task count, the metrics are bounded, or relatively stable and predictable.
  • The best way to find the right autoscaling strategy is to test it in your specific environment and against your specific load patterns.

Full post here, 8 mins read