Load-generation analysis

Load generation is effectively the inverse of service starvation. Rather than scaling down a service tier to the point of failure, you generate synthetic loads on your services until they reach the point of failure.

A percentage of your normal workload then is based on the amount of synthetic workload that you were able to successfully process. This represents the free capacity in your service tier.

Static-resource analysis

This approach involves identifying the most constrained computational resource for a given service tier (typically, CPU, memory, disk space, or network I/O) and determining what percentage of that resource is available to the service as its currently deployed.

Although this can be a quick way to estimate free capacity in a service, there are a few important gotchas:

1. Some services have dramatically different resource consumption profiles at different points in their lifecycle (for example, in startup mode versus normal operation).
2. It may be necessary to look at an application’s internals to determine free memory. For example, an application may allocate its maximum configured memory at startup time even if it’s not using that memory.
3. Resources in a network interface controller (NIC) or switch typically reach saturation at a throughput rate lower than the maximum advertised by manufacturers. Because of this, it’s important to benchmark the actual maximum possible throughput rather than relying on the manufacturer’s specs.

capacity planning is work teams do to make sure their services have enough spare capacity to handle any likely increases in workload, and enough buffer capacity to absorb normal workload spikes, between planning iterations.

During the capacity-planning process, teams answer these four questions:

1. How much free capacity currently exists in each of our services?
2. How much capacity buffer do we need for each of our services?
3. How much workload growth do we expect between now and our next capacity-planning iteration, factoring in both natural customer-driven growth and new product features?
4. How much capacity do we need to add to each of our services so that we’ll still have our targeted free capacity buffer after any expected workload growth?

The answers to those four questions—along with the architectures and uses of the services—help determine the methodology our teams use to calculate their capacity needs.

Calculating capacity

We use three common methodologies to calculate how much free capacity exists for a given service:

Service-starvation
Load-generation
Static-resource analysis
It’s important to note that each component of a service tier (for example, application host, load balancer, or database instances) requires separate capacity analysis.

Service-starvation analysis

Service starvation involves reducing the number of service instances available to a service tier until the service begins to falter under a given workload. The amount of resource “starvation” that’s possible without causing the service to fail represents the free capacity in the service tier.

For example, a team has 10 deployed instances of service x, which handle 10K RPM hard drives in a production environment. The team finds that it’s able to reduce the number of instances of service x to 8 and still support the same workload.

This tells the team two things:

1. A single service instance is able to handle a max of 1.25K RPM drives (in other words, 10K drives divided by 8 instances).
The service tier normally has 20% free capacity: Two “free” instances equals 20% of the service tier.
2. Of course, this scenario assumes that the service tier supports a steady-state of 10K RPMs; if the workload is spiky, there may actually be less (or more) than 20% free capacity across the 10 service instances.