Zoran's Blog: Rough guide - How many servers do I need for required Requests per Second

https://wrongsideofmemphis.wordpress.com/2013/10/21/requests-per-second-a-reference/
https://www.rootusers.com/linux-web-server-performance-benchmark-2016-results/
http://www.tpc.org/tpcc/results/tpcc_last_ten_results.asp
https://www.mysql.com/why-mysql/benchmarks/
https://redis.io/topics/benchmarks

We are going to use RPS (requests per second) as the metric. This measures the throughput, which is typically the most important measure. There are other parameters that can be interesting (latency) depending on the application, but in a typical application, throughput is the main metric.

Those requests can be pure HTTP requests (getting a URL from a web server), or can be other kind of server requests. Database queries, fetch the mail, bank transactions, etc. The principles are the same.

I/O bound or CPU bound

There are two type of requests, I/O bound and CPU bound.

Typically, requests are limited by I/O. That means that it fetches the info from a database, or reads a file, or gets the info from network. CPU is doing nothing most of the time. Due the wonders of the Operative System, you can create multiple workers that will keep doing requests while other workers wait. In this case, the server is limited by the amount or workers it has running. That means RAM memory. More memory, more workers.[1]

In memory bound systems, getting the number of RPS is making the following calculation:

RPS = (memory / worker memory) * (1 / Task time)

For example:

Total RAM	Worker memory	Task time	RPS
16Gb	40Mb	100ms	4,000
16Gb	40Mb	50ms	8,000
16Gb	400Mb	100ms	400
16Gb	400Mb	50ms	800

ome other requests, like image processing or doing calculations, are CPU bound. That means that the limiting factor in the amount of CPU power the machine has. Having a lot of workers does not help, as only one can work at the same time per core. Two cores means two workers can run at the same time. The limit here is CPU power and number of cores. More cores, more workers.

In CPU bound systems, getting the number of RPS is making the following calculation:

RPS = Num. cores * (1 /Task time)

For example:

Num. cores	Task time	RPS
4	10ms	400
4	100ms	40
16	10ms	1,600
16	100ms	160

Of course, those are ideal numbers. Servers need time and memory to run other processes, not only workers. And, of course, they can be errors. But there are good numbers to check and keep in mind.

Calculating the load of a system

If we don’t know the load a system is going to face, we’ll have to make an educated guess. The most important number is the sustained peak. That means the maximum number of requests that are going to arrive at any second during a sustained period of time. That’s the breaking point of the server.

That can depend a lot on the service, but typically services follow a pattern with ups and downs. During the night the load decreases, and during day it increases up to certain point, stays there, and then goes down again. Assuming that we don’t have any idea how the load is going to be, just assume that all the expected requests in a day are going to be done in 4 hours. Unless load is very very spiky, it’ll probably be a safe bet.

For example,1 million requests means 70 RPS. 100 million requests mean 7,000 RPS. A regular server can process a lot of requests during a whole day.

That’s assuming that the load can be calculated in number of requests. Other times is better to try to estimate the number of requests a user will generate, and then move from the number of users. E.g. A user will make 5 requests in a session. With 1 Million users in 4 hours, that means around 350 RPS at peak. If the same users make 50 requests per sessions, that’s 3,500 RPS at peak.

A typical load for a server

This two numbers should only be user per reference, but, in my experience, I found out that are numbers good to have on my head. This is just to get an idea, and everything should be measured. But just as rule of thumb.

1,000 RPS is not difficult to achieve on a normal server for a regular service.

2,000 RPS is a decent amount of load for a normal server for a regular service.

More than 2K either need big servers, lightweight services, not-obvious optimisations, etc (or it means you’re awesome!). Less than 1K seems low for a server doing typical work (this means a request that is simple and not doing a lot of work) these days.

Zoran's Blog

Saturday, 17 September 2016

Rough guide - How many servers do I need for required Requests per Second

I/O bound or CPU bound

Calculating the load of a system

A typical load for a server

No comments:

Post a Comment