- Latency (measured in time units) is the time between a stimulus and a response, while throughput (measured in deliveries per time unit) is the rate at which the system meets its goals.
- Sometimes latency and throughput interfere with each other. Do you field a request and respond before taking the next request (low latency for the first customer but overall lower throughput), or do you accept the second request while processing the first for higher throughput?
- You make latency/throughput tradeoffs every day and most of us are biased towards throughput. For example - you carefully plan all foreseeable architectural improvements instead of initiating the first profitable change you come across.
- Instead, you should optimize for latency. For example - if preferences are likely to change between requests (high rate of change), so you can adapt, which is less wasteful.
- To decide between throughput and latency, consider the cost of delay, whether you might learn something that changes your approach subsequently, and whether external factors might force a new approach.
Full post here, 4 mins read