35

(Commonly called the C10K problem)

Is there a more contemporary review of solutions to the c10k problem (Last updated: 2 Sept 2006), specifically focused on Linux (epoll, signalfd, eventfd, timerfd..) and libraries like libev or libevent?

Something that discusses all the solved and still unsolved issues on a modern Linux server?

jweyrich
  • 31,198
  • 5
  • 66
  • 97
gdamjan
  • 998
  • 9
  • 12
  • To those that voted to close: did you read the link and the actual question? Is there anything more recent than 2006 that talks to some/all of these points? – Joe Jun 28 '10 at 01:52
  • 3
    This just doesn't seem like a progamming question to me. – Gabe Jun 28 '10 at 01:56
  • 6
    @Gabe: Do you think some IT guy is going to understand the difference between [a]synchronous I/O and the flavors of API? – L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ Jun 28 '10 at 01:57
  • 4
    Longpoke: He's asking for a document search, not programming advice. – Gabe Jun 28 '10 at 02:15
  • 3
    While I agree with Gabe about the nature of this query, the underlying issue *is* programming related and *is* interesting. However, just judging by the length of the linked document it may "require extended discussion" to answer. Finally, @damjan.mk, it would have be better to have told us what "c10k" means (i.e. being able to serve at least 10,000 clients at once from a single machine in a web context). I'm holding my close vote for now, but ... – dmckee --- ex-moderator kitten Jun 29 '10 at 23:54
  • Sorry, I kind of thought c10k is a well known abbrevation in these circles. I never remmember to be as explicit as possible. – gdamjan Jun 30 '10 at 00:47
  • 14
    Please, don't close this question. IMO it's totally relevant, and I've searched for its answer in the past. No luck though :-) – jweyrich Jul 29 '10 at 01:46
  • @damjan.mk - StackOverflow serves many different populations of programmers. I would hazard a guess that most (and certainly many) are not even familiar with network programming at all. Your question is not to the audience of network programmers, it's to the audience of programmers on StackOverflow. – Omnifarious Aug 02 '10 at 21:01
  • 1
    I don't think there's been massive improvements in kernel-side select/poll/epoll/kqueue etc technologies for c10k in the last five years. And, if you like the libevent interface, it'll probably use the finest tools for the platform under consideration. – sarnold Aug 05 '10 at 02:08
  • 1
    This comment just adds an article I consider important: **Comparing and Evaluating epoll, select, and poll Event Mechanisms** - http://www.cs.uwaterloo.ca/~brecht/papers/getpaper.php?file=ols-2004.pdf – jweyrich Aug 07 '10 at 05:38

7 Answers7

11

The C10K problem generally assumes you're trying to optimize a single server, but as your referenced article points out "hardware is no longer the bottleneck". Therefore, the first step to take is to make sure it isn't easiest and cheapest to just throw more hardware in the mix.

If we've got a $500 box serving X clients per second, it's a lot more efficient to just buy another $500 box to double our throughput instead of letting an employee gobble up who knows how many hours and dollars trying to figure out how squeeze more out of the original box. Of course, that's assuming our app is multi-server friendly, that we know how to load balance, etc, etc...

joe snyder
  • 3,629
  • 2
  • 21
  • 13
  • 13
    What if someone wants to write a high-performance library to save your money and possibly of thousands others? – jweyrich Aug 05 '10 at 00:28
  • 1
    @jweyrich: i wouldn't envy that writing task, because u'd probably be constantly playing catchup to new o/s-server-hardware versions 4 the configurations you do support, while competing with o/s-server-hardware versions u don't support. the spectrum of clients/servers/transactions/bandwidth benchmarks to determine "best" or "fastest" would be huge & vary widely by customer. an employee can burn $500 in salary & overhead on a "high-performance lib" search/evaluation in just a few hours & still not have a proven solution. it's a tough sell to convince people to not just buy another cheap box... – joe snyder Aug 05 '10 at 19:45
  • 2
    This is a bad energy wasting way to do things. We shouldn't do this now where the environmental cost goes up a lot (together with power prices). Twitter processes about 800KB/s on incoming tweets and needs 100,000 servers to handle it is just the insane result of this thinking. – Lothar Mar 28 '14 at 17:34
10

Coincidentally, just a few days ago, Programming Reddit or maybe Hacker News mentioned this piece:

Thousands of Threads and Blocking IO

In the early days of Java, my C programming friends laughed at me for doing socket IO with blocking threads; at the time, there was no alternative. These days, with plentiful memory and processors it appears to be a viable strategy.

The article is dated 2008, so it pulls your horizon up by a couple of years.

Carl Smotricz
  • 66,391
  • 18
  • 125
  • 167
  • 2
    I'm sure your hardware vendor is happy. – ninjalj Jul 31 '10 at 17:19
  • 3
    I'm more concerned with making damjan.mk happy. But please don't misconstrue my comment: This approach runs fine on a run-of-the-mill store bought PC, which is hard these days to find with less than a dual core CPU and 2G of RAM. – Carl Smotricz Jul 31 '10 at 20:00
  • I think the point of the presentation there is that you can use thousands of threads and it doesn't require more powerful hardware any more. Your application will become I/O or CPU bound independent of the choice between async or threaded I/O. – Dobes Vandermeer Mar 24 '12 at 13:39
5

To answer OP's question, you could say that today the equivalent document is not about optimizing a single server for load, but optimizing your entire online service for load. From that perspective, the number of combinations is so large that what you are looking for is not a document, it is a live website that collects such architectures and frameworks. Such a website exists and its called www.highscalability.com

Side Note 1:

I'd argue against the belief that throwing more hardware at it is a long term solution:

  • Perhaps the cost of an engineer that "gets" performance is high compared to the cost of a single server. What happens when you scale out? Lets say you have 100 servers. A 10 percent improvement in server capacity can save you 10 servers a month.

  • Even if you have just two machines, you still need to handle performance spikes. The difference between a service that degrades gracefully under load and one that breaks down is that someone spent time optimizing for the load scenario.

Side note 2:

The subject of this post is slightly misleading. The CK10 document does not try to solve the problem of 10k clients per second. (The number of clients per second is irrelevant unless you also define a workload along with sustained throughput under bounded latency. I think Dan Kegel was aware of this when he wrote that doc.). Look at it instead as a compendium of approaches to build concurrent servers, and micro-benchmarks for the same. Perhaps what has changed between then and now is that you could assume at one point of time that the service was for a website that served static pages. Today the service might be a noSQL datastore, a cache, a proxy or one of hundreds of network infrastructure software pieces.

carlsborg
  • 2,628
  • 19
  • 21
  • 2
    I agree with your points, but you need to update your break even point for a performance engineer. A reasonable server (8 core, 7 GB RAM) costs US$0.68 per hour on Amazon's EC2. Cutting 10 servers only saves you $60K per year. That won't get you much of a performance engineer. – Ken Fox Aug 13 '10 at 19:53
2

You can also take a look at this series of articles:

http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3

He shows a fair amount of performance data and the OS configuration work he had to do in order to support 10K and then 1M connections.

It seems like a system with 30GB of RAM could handle 1 million connected clients on a sort of social network type of simulation, using a libevent frontend to an Erlang based app server.

Dobes Vandermeer
  • 8,463
  • 5
  • 43
  • 46
1

libev runs some benchmarks against themselves and libevent...

rogerdpack
  • 62,887
  • 36
  • 269
  • 388
1

I'd recommend Reading Zed Shaw's poll, epoll, science, and superpoll[1]. Why epoll isn't always the answer, and why sometimes it's even better to go with poll, and how to bring the best of both worlds.

[1] http://sheddingbikes.com/posts/1280829388.html

racetrack
  • 3,766
  • 30
  • 30
  • I can't confirm his results right now, but I believe it's true since each call to poll requires much more data (all events you're interested in) to be transferred between user and kernel-space. The compensation should start when more of these events are active, and more new events are occurring (read higher load). I should say I don't like the "superpoll" approach, as it adds lots of unnecessary syscalls for not being a kernel implementation. Anyway, the article gave me good insights. +1. – jweyrich Aug 07 '10 at 05:26
  • @jweyrich: Did you see this? http://sheddingbikes.com/posts/1280882826.html He has provided the full C code, as well as the R environment of his tests, might help experimenting this on your own – racetrack Aug 07 '10 at 08:01
  • yes, thanks. I'll be testing it when I get some free time. More here: http://sheddingbikes.com/posts/1281174543.html – jweyrich Aug 07 '10 at 16:32
1

Have a look at the RamCloud project at Stanford: https://ramcloud.atlassian.net/wiki/display/RAM/RAMCloud

Their goal is 1,000,000 RPC operations/sec/server. They have numerous benchmarks and commentary on the bottlenecks that are present in a system which would prevent them from reaching their throughput goals.

Ron Klein
  • 9,178
  • 9
  • 55
  • 88
Noah Watkins
  • 5,446
  • 3
  • 34
  • 47