To create an evolvable API, stop thinking about URLs

To build an evolvable API, instead of forcing clients to have prior knowledge of URLs, fields and HTTP methods, you should let the client ask the server what is required to complete an operation and indicate the preferred host and path.
Read more

To create an evolvable API, stop thinking about URLs

  • To build an evolvable API, instead of forcing clients to have prior knowledge of URLs, fields and HTTP methods, you should let the client ask the server what is required to complete an operation and indicate the preferred host and path.
  • Critical aspects of evolvable API include:

a) The state of the conversation being stored in the network - not sourced from either client or server.

b) No versioning needed - when you add or remove data from a response, clients should know how to react. If they don’t know how to react to a new feature, they should be able to ignore it and work in the old way.

c) The server owns actions which contain values for URLs, methods, and fields, so that they control where clients go to continue the conversation, with only the entry point hardcoded in the client.

  • With control of URLs in the server, it can run A/B testing and direct clients to different servers running the same instance of the application. The server can also implement a polling functionality to track the status of requests.
  • Model communication on how people actually operate. Think not only about a generic language but developing a shared domain vocabulary.

Full post here, 10 mins read

The differences between gateway, microgateway and service mesh

An API gateway is a central interface for all external communications. It typically works by invoking multiple microservices and aggregating results to determine the best path.
Read more

The differences between gateway, microgateway and service mesh

  • An API gateway is a central interface for all external communications. It typically works by invoking multiple microservices and aggregating results to determine the best path.
  • It may also handle authentication, input validation and filtering, and metric collection, as well as transforming requests and/or results. For the microservices network, it offers lower latency, better efficiency and higher security, as well as easier isolation of single sources of failure.
  • API microgateways are proxies sitting close to microservices for internal communication between microservices, allowing better governance, discovery, observability, and stability for developers and expose the policy enforcement point and security controls to operators.
  • They are a more granular solution than a single API gateway due to the control of exposure. They offer low latency and a small footprint, as requests don’t need to wait their turn. This does imply code duplication across multiple microservice instances, which can be inefficient if code is not intelligently structured.
  • A service mesh is a layer between microservices for all service-to-service communication that replaces direct communication between services. It will often have in-built support for resiliency, error checking, and service discovery. It is similar to a microgateway but with network communications entirely abstracted from business logic, allowing developers to focus entirely on the latter.

Full post here, 8 mins read

Designing resilient systems beyond retries: rate limiting

You can limit requests by client or user account or by endpoints. These can be combined to use different levels of thresholds together, in a specific order, culminating in a server-wide threshold possibly.
Read more

Designing resilient systems beyond retries: rate limiting

  • In distributed systems, a circuit-breaker pattern and retries are commonly used to improve resiliency. A retry ‘storm’ is a common risk if the server cannot handle the increased number of requests, and a circuit-breaker can prevent it.
  • In a large organization with hundreds of microservices, coordinating and maintaining all the circuit-breakers is difficult and rate-limiting or throttling can be a second line of defense.
  • You can limit requests by client or user account (say, 1000 requests per hour each and then reject requests till the time window resets) or by endpoints (benchmarked to server capabilities so that the limit applies across all clients). These can be combined to use different levels of thresholds together, in a specific order, culminating in a server-wide threshold possibly.
  • Consider global versus local rate-limiting. The former is especially useful in microservices architecture because bottlenecks may not be tied to individual servers but to exhausted downstream resources such as a database, third-party service, or another microservice.
  • Take care to ensure the rate-limiting service does not become a single point of failure, nor should it add significant latency. The system must function even if the rate-limiter experiences problems, and perhaps fall back to its local limit strategy.

Full post here, 11 mins read

An overview of caching methods

The most common caching methods are browser caching, application caching and key-value caching. Browser caching is a collaboration between the browser and the web server and you don’t have to write any extra code.
Read more

An overview of caching methods

  • The most common caching methods are browser caching, application caching and key-value caching.
  • Browser caching is a collaboration between the browser and the web server and you don’t have to write any extra code. For example - in Chrome when you reload a page you have visited before, the date specified under ‘expires’ in the 'Responses' header determines whether the browser loads resources directly from cache (from your first visit) or requests the resources again from the server. The server uses the headers passed by the browser (headers like If-modified-since or If-none-match or Etag) to determine whether to send the resources afresh or ask the browser to load from its cache.
  • Application-level caching is also called memoization and it is useful if your program is very slow. Think of cases where you are reading a file and extracting data from it or requesting data from an API. The results of any slow method are placed in an instance variable and returned on subsequent calls to the method. This speeds up the method. Though the downsides to this kind of caching are that you lose the cache on restarting the application and you cannot share the cache between multiple servers.
  • Key-value data caching takes memoization a step further with dedicated databases like memcache or Redis. This allows cached data to persist through user requests (allowing data sharing) and application reboots, but does introduce a dependency to your application and adds another object to monitor.
  • To determine the best method for you, start with browser caching as the baseline. Then identify your hotspots with an application profiling tool before choosing which method to grow with to add a second layer of caching.

Full post here, 7 mins read

An introduction to load testing

Common parameters to test should include server resources (CPU, memory, etc) for handling anticipated loads; quickness of response for the user; efficiency of application; the need for scaling up hardware or scaling out to multiple servers; and maximum requests per second.
Read more

An introduction to load testing

  • Load testing is done by running the software on one machine (or cluster of machines) to generate a large number of requests to the webserver on a second machine (or cluster).
  • Common parameters to test should include server resources (CPU, memory, etc) for handling anticipated loads; quickness of response for the user; efficiency of application; the need for scaling up hardware or scaling out to multiple servers; particularly resource-intensive pages or API calls; and maximum requests per second.
  • In general, a higher number of requests implies higher latency. But it is a good practice in real life to test multiple times at different request rates. Though a website can load in 2-5 seconds, web server latency should typically be around 50-200 milliseconds. Remember that even ‘imperceptible’ improvements add up in the aggregate for a better UX.
  • As a first step, monitor resources - mostly CPU load and free memory.
  • Next, find the maximum response rate of your web server by setting desired concurrency (100 is a safe default but check settings like MaxClients, MaxThreads, etc for your server) and test duration in any load testing tool. If your software only handles one URL at a time, run the test with a few different URLs with varying resource requirements. This should push the CPU idle time to 0% and raise response times beyond real-world expectations.
  • Dial back the load and test again for how your server performs when not pushed to its absolute limit: specify exact requests per second, and cut your maximum requests from the earlier step in half. Step up or down requests by another halfway each time till you reach your maximum for acceptable latency (which should be in the 99th or even 99.999th percentile).
  • Some options among load-testing software you can explore - ab (ApacheBench), JMeter, Siege, Locust, and wrk2.

Full post here, 13 mins read