#performance
8 posts

Inefficient efficiency

Latency (measured in time units) is the time between a stimulus and a response, while throughput (measured in deliveries per time unit) is the rate at which the system meets its goals.
Read more

Inefficient efficiency

  • Latency (measured in time units) is the time between a stimulus and a response, while throughput (measured in deliveries per time unit) is the rate at which the system meets its goals.
  • Sometimes latency and throughput interfere with each other. Do you field a request and respond before taking the next request (low latency for the first customer but overall lower throughput), or do you accept the second request while processing the first for higher throughput?
  • You make latency/throughput tradeoffs every day and most of us are biased towards throughput. For example - you carefully plan all foreseeable architectural improvements instead of initiating the first profitable change you come across.
  • Instead, you should optimize for latency. For example - if preferences are likely to change between requests (high rate of change), so you can adapt, which is less wasteful.
  • To decide between throughput and latency, consider the cost of delay, whether you might learn something that changes your approach subsequently, and whether external factors might force a new approach.

Full post here, 4 mins read

Designing resilient systems beyond retries: rate limiting

You can limit requests by client or user account or by endpoints. These can be combined to use different levels of thresholds together, in a specific order, culminating in a server-wide threshold possibly.
Read more

Designing resilient systems beyond retries: rate limiting

  • In distributed systems, a circuit-breaker pattern and retries are commonly used to improve resiliency. A retry ‘storm’ is a common risk if the server cannot handle the increased number of requests, and a circuit-breaker can prevent it.
  • In a large organization with hundreds of microservices, coordinating and maintaining all the circuit-breakers is difficult and rate-limiting or throttling can be a second line of defense.
  • You can limit requests by client or user account (say, 1000 requests per hour each and then reject requests till the time window resets) or by endpoints (benchmarked to server capabilities so that the limit applies across all clients). These can be combined to use different levels of thresholds together, in a specific order, culminating in a server-wide threshold possibly.
  • Consider global versus local rate-limiting. The former is especially useful in microservices architecture because bottlenecks may not be tied to individual servers but to exhausted downstream resources such as a database, third-party service, or another microservice.
  • Take care to ensure the rate-limiting service does not become a single point of failure, nor should it add significant latency. The system must function even if the rate-limiter experiences problems, and perhaps fall back to its local limit strategy.

Full post here, 11 mins read

How to continuously profile tens of thousands of production servers

Some lessons & solutions from the Salesforce team that can be useful for other engineers too.
Read more

How to continuously profile tens of thousands of production servers

Some lessons & solutions from the Salesforce team that can be useful for other engineers too.

  • Ensure scalability: If writes or data are too voluminous for a single network or storage solution to handle, distribute the load across multiple data centers, coordinating retrieval from a centralized hub for investigating engineers, who can specify which clusters of hosts they may want data from.
  • Design for fault-tolerance: In a crisis where memory and CPU are overwhelmed or network connectivity lost, profiling data can be lost too. Build resilience in your buffering and pass the data to permanent storage, while allowing data to persist in batches.
  • Provide language-agnostic runtime support: If users might be working in different languages, capture and represent profiling and observability data in a way that works regardless of the underlying language. Attach the language as metadata to profiling data points so that users can query by language and ensure data structures for stack traces and metadata are generic enough to support multiple languages and environments.
  • Allow debugging engineers to access domain-specific contexts to drive their investigations to a speedier resolution. You can do a deep search of traces to match a regular expression, which is particularly useful to developers debugging the issue at hand.

Full post here, 9 mins read

Tips for 10x application performance

Cache both static and dynamic content to reduce the load on application servers. Use established compression standards to reduce file sizes for photos, videos, and music. Avoid leaving text data, including HTML, CSS, and JavaScript uncompressed.
Read more

Tips for 10x application performance

  • Accelerate and secure applications with a reverse proxy server to free up the application server from waiting for users to interact with it. It is also a prerequisite for many other performance increasing capabilities - load balancing, caching static files, and for better security & scalability too.
  • Apply load balancing to protocols such as HTTP, HTTPS, SPDY, HTTP/2, WebSocket, FastCGI, SCGI, uwsgi, memcached, TCP-based applications, Layer 4 protocols etc.
  • Cache both static and dynamic content to reduce the load on application servers.
  • Use established compression standards to reduce file sizes for photos, videos, and music. Avoid leaving text data, including HTML, CSS, and JavaScript uncompressed as their compression can have a large effect especially over slow or otherwise constrained connections. If you use SSL, compression reduces the amount of data to be SSL-encoded, saving time.
  • Monitor real-world performance closely, in real-time, both within specific devices and across your web infrastructure. You should use global application performance monitoring tools to check page load times remotely and also monitor the delivery side.

Full post here, 20 mins read

Improving Mongo performance by managing indexes

To define an efficient index, you can build on top of a previously defined index as well. When you are compound indexing in this way, determine which property of your query is the most unique and give it a higher cardinality.
Read more

Improving Mongo performance by managing indexes

  • You can query large collections efficiently by defining an index and ensuring it is built in the background.
  • To define an efficient index, you can build on top of a previously defined index as well. When you are compound indexing in this way, determine which property of your query is the most unique and give it a higher cardinality. This higher cardinality will help in limiting the search area of your query.
  • To ensure your database uses your index efficiently, ensure the index fits in the available RAM on your database server as part of Mongo’s working set. Check this using the db.stats().indexSize and determining your default allocation of RAM.
  • To keep index sizes small, examine the usage of indexes of a given collection and remove the unused ones, examine compound indexes to check whether some are redundant, make indexes sparser by imposing a $partialFilterExpression constraint to tell them which documents to use, and minimize fields in compound indexes.

Full post here, 9 mins read