9

i got a very specific question about server programming in UNIX (Debian, kernel 2.6.32). My goal is to learn how to write a server which can handle a huge amount of clients. My target is more than 30 000 concurrent clients (even when my college mentions that 500 000 are possible, which seems QUIIITEEE a huge amount :-)), but i really don't know (even whats possible) and that is why I ask here. So my first question. How many simultaneous clients are possible? Clients can connect whenever they want and get in contact with other clients and form a group (1 group contains a maximum of 12 clients). They can chat with each other, so the TCP/IP package size varies depending on the message sent. Clients can also send mathematical formulas to the server. The server will solve them and broadcast the answer back to the group. This is a quite heavy operation.

My current approach is to start up the server. Than using fork to create a daemon process. The daemon process binds the socket fd_listen and starts listening. It is a while (1) loop. I use accept() to get incoming calls.

Once a client connects I create a pthread for that client which will run the communication. Clients get added to a group and share some memory together (needed to keep the group running) but still every client is running on a different thread. Getting the access to the memory right was quite a hazzle but works fine now.

In the beginning of the programm i read out the /proc/sys/kernel/threads-max file and according to that i create my threads. The amount of possible threads according to that file is around 5000. Far away from the amount of clients i want to be able to serve. Another approach i consider is to use select () and create sets. But the access time to find a socket within a set is O(N). This can be quite long if i have more than a couple of thousands clients connected. Please correct me if i am wrong.

Well, i guess i need some ideas :-)

Groetjes Markus

P.S. i tag it for C++ and C because it applies to both languages.

markus_p
  • 574
  • 8
  • 25

3 Answers3

5

The best approach as of today is an event loop like libev or libevent.

In most cases you will find that one thread is more than enough, but even if it isn't, you can always have multiple threads with separate loops (at least with libev).

Libev[ent] uses the most efficient polling solution for each OS (and anything is more efficient than select or a thread per socket).

a sad dude
  • 2,775
  • 17
  • 20
1

You'll run into a couple of limits:

  1. fd_set size: This is changable at compile time, but has quite a low limit by default, this affects select solutions.
  2. Thread-per-socket will run out of steam far earlier - I suggest putting the longs calculations in separate threads (with pooling if required), but otherwise a single thread approach will probably scale.

To reach 500,000 you'll need a set of machines, and round-robin DNS I suspect.

TCP ports shouldn't be a problem, as long as the server doesn't connection back to the clients. I always seem to forget this, and have to be reminded.

File descriptors themselves shouldn't be too much of a problem, I think, but getting them into your polling solution may be more difficult - certainly you don't want to be passing them in each time.

Douglas Leeder
  • 52,368
  • 9
  • 94
  • 137
  • 1. no, there's only one port on the server. – Karoly Horvath Apr 02 '12 at 15:35
  • @Karoly Every connection requires a port. A port is used to make connections, after which, the OS assigns a random port in the upper range for further communication. – Ioan Apr 02 '12 at 15:40
  • I didnt think about the TCP port problem. Upvote for mentioning this. My goal is 30 000 so its halfway through. – markus_p Apr 02 '12 at 15:46
  • @loan: that's client side: en.wikipedia.org/wiki/Ephemeral_port – Karoly Horvath Apr 02 '12 at 15:48
  • @Karoly Actually, your link seems to confirm what I said, read the sentence pertaining to server-side. This one explains things better: [TCP](http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Resource_usage) – Ioan Apr 02 '12 at 17:30
  • @loan: better? Which sentence do you think explains it there? note: if the server would change the port how would the client know where to send data?? – Karoly Horvath Apr 02 '12 at 17:41
  • @Karoly You are correct, but the whole "Resources" section in my link seems to better explain what's going on. Anyway, just a little confused given this answer's note on TCP ports... – Ioan Apr 02 '12 at 20:12
  • I saw, but was referring to the part about not being a problem until connecting back to the clients... I guess you mean creating a new connection, since the current connection is already two-way. – Ioan Apr 03 '12 at 19:35
  • @loan, yes I mean if the server wants to initiate a connection, for example if the clients only connect transiently. But I agree - unlikely. – Douglas Leeder Apr 03 '12 at 21:16
0

I think you can use the event model(epoll + worker threads pool) to solve this problem. first listen and accept in main thread, if the client connects to the server, the main thread distribute the client_fd to one worker thread, and add epoll list, then this worker thread will handle the reqeust from the client.

the number of worker thread can be configured by the problem, and it must be no more the the 5000.

yaronli
  • 699
  • 5
  • 10
  • without the epoll, thats what i am doing at the moment. But i want to handle more than 5000 clients. So one thread per client is not a deal. – markus_p Apr 02 '12 at 15:49
  • @MarkusPfundstein Each thread would handle many clients in this solution. – Douglas Leeder Apr 02 '12 at 16:01
  • @MarkusPfundstein One thread can handle many clients, you do not have to handle one request per thread, and create thread or destroy thread is time consuming. you can reference the nginx implements, which is a web server and use the worker process + epoll, and can handle 10K cocurrent requets. – yaronli Apr 03 '12 at 08:03