#performance
7 posts

Designing resilient systems beyond retries: rate limiting

You can limit requests by client or user account or by endpoints. These can be combined to use different levels of thresholds together, in a specific order, culminating in a server-wide threshold possibly.
Read more

Designing resilient systems beyond retries: rate limiting

  • In distributed systems, a circuit-breaker pattern and retries are commonly used to improve resiliency. A retry ‘storm’ is a common risk if the server cannot handle the increased number of requests, and a circuit-breaker can prevent it.
  • In a large organization with hundreds of microservices, coordinating and maintaining all the circuit-breakers is difficult and rate-limiting or throttling can be a second line of defense.
  • You can limit requests by client or user account (say, 1000 requests per hour each and then reject requests till the time window resets) or by endpoints (benchmarked to server capabilities so that the limit applies across all clients). These can be combined to use different levels of thresholds together, in a specific order, culminating in a server-wide threshold possibly.
  • Consider global versus local rate-limiting. The former is especially useful in microservices architecture because bottlenecks may not be tied to individual servers but to exhausted downstream resources such as a database, third-party service, or another microservice.
  • Take care to ensure the rate-limiting service does not become a single point of failure, nor should it add significant latency. The system must function even if the rate-limiter experiences problems, and perhaps fall back to its local limit strategy.

Full post here, 11 mins read

How to continuously profile tens of thousands of production servers

Some lessons & solutions from the Salesforce team that can be useful for other engineers too.
Read more

How to continuously profile tens of thousands of production servers

Some lessons & solutions from the Salesforce team that can be useful for other engineers too.

  • Ensure scalability: If writes or data are too voluminous for a single network or storage solution to handle, distribute the load across multiple data centers, coordinating retrieval from a centralized hub for investigating engineers, who can specify which clusters of hosts they may want data from.
  • Design for fault-tolerance: In a crisis where memory and CPU are overwhelmed or network connectivity lost, profiling data can be lost too. Build resilience in your buffering and pass the data to permanent storage, while allowing data to persist in batches.
  • Provide language-agnostic runtime support: If users might be working in different languages, capture and represent profiling and observability data in a way that works regardless of the underlying language. Attach the language as metadata to profiling data points so that users can query by language and ensure data structures for stack traces and metadata are generic enough to support multiple languages and environments.
  • Allow debugging engineers to access domain-specific contexts to drive their investigations to a speedier resolution. You can do a deep search of traces to match a regular expression, which is particularly useful to developers debugging the issue at hand.

Full post here, 9 mins read

Tips for 10x application performance

Cache both static and dynamic content to reduce the load on application servers. Use established compression standards to reduce file sizes for photos, videos, and music. Avoid leaving text data, including HTML, CSS, and JavaScript uncompressed.
Read more

Tips for 10x application performance

  • Accelerate and secure applications with a reverse proxy server to free up the application server from waiting for users to interact with it. It is also a prerequisite for many other performance increasing capabilities - load balancing, caching static files, and for better security & scalability too.
  • Apply load balancing to protocols such as HTTP, HTTPS, SPDY, HTTP/2, WebSocket, FastCGI, SCGI, uwsgi, memcached, TCP-based applications, Layer 4 protocols etc.
  • Cache both static and dynamic content to reduce the load on application servers.
  • Use established compression standards to reduce file sizes for photos, videos, and music. Avoid leaving text data, including HTML, CSS, and JavaScript uncompressed as their compression can have a large effect especially over slow or otherwise constrained connections. If you use SSL, compression reduces the amount of data to be SSL-encoded, saving time.
  • Monitor real-world performance closely, in real-time, both within specific devices and across your web infrastructure. You should use global application performance monitoring tools to check page load times remotely and also monitor the delivery side.

Full post here, 20 mins read

Improving Mongo performance by managing indexes

To define an efficient index, you can build on top of a previously defined index as well. When you are compound indexing in this way, determine which property of your query is the most unique and give it a higher cardinality.
Read more

Improving Mongo performance by managing indexes

  • You can query large collections efficiently by defining an index and ensuring it is built in the background.
  • To define an efficient index, you can build on top of a previously defined index as well. When you are compound indexing in this way, determine which property of your query is the most unique and give it a higher cardinality. This higher cardinality will help in limiting the search area of your query.
  • To ensure your database uses your index efficiently, ensure the index fits in the available RAM on your database server as part of Mongo’s working set. Check this using the db.stats().indexSize and determining your default allocation of RAM.
  • To keep index sizes small, examine the usage of indexes of a given collection and remove the unused ones, examine compound indexes to check whether some are redundant, make indexes sparser by imposing a $partialFilterExpression constraint to tell them which documents to use, and minimize fields in compound indexes.

Full post here, 9 mins read

Tips for architecting fast data applications

Implement an efficient messaging backbone for reliable, secure data exchange with low latency. Apache Kafka is a good option for this.
Read more

Tips for architecting fast data applications

  • Understand requirements in detail: how large each message is, how many messages are expected per minute, whether there may be large changes in frequency, whether records can be batch-processed, whether time relationships and ordering need to be preserved, how ‘dirty’ the data may be and does the dirt need to be cleaned, reported or ignored, etc.
  • Implement an efficient messaging backbone for reliable, secure data exchange with low latency. Apache Kafka is a good option for this.
  • Leverage your SQL knowledge, applying the same relational algebra to data streams in time-varying relations.
  • Deploy cluster managers or cluster management solutions for greater scalability, agility, and resilience.

Full post here, 7 mins read