Best approach for writing a Linux Server in C (phtreads, select or fork ? )

Question

i got a very specific question about server programming in UNIX (Debian, kernel 2.6.32). My goal is to learn how to write a server which can handle a huge amount of clients. My target is more than 30 000 concurrent clients (even when my college mentions that 500 000 are possible, which seems QUIIITEEE a huge amount :-)), but i really don't know (even whats possible) and that is why I ask here. So my first question. How many simultaneous clients are possible? Clients can connect whenever they want and get in contact with other clients and form a group (1 group contains a maximum of 12 clients). They can chat with each other, so the TCP/IP package size varies depending on the message sent. Clients can also send mathematical formulas to the server. The server will solve them and broadcast the answer back to the group. This is a quite heavy operation.

My current approach is to start up the server. Than using fork to create a daemon process. The daemon process binds the socket fd_listen and starts listening. It is a while (1) loop. I use accept() to get incoming calls.

Once a client connects I create a pthread for that client which will run the communication. Clients get added to a group and share some memory together (needed to keep the group running) but still every client is running on a different thread. Getting the access to the memory right was quite a hazzle but works fine now.

In the beginning of the programm i read out the /proc/sys/kernel/threads-max file and according to that i create my threads. The amount of possible threads according to that file is around 5000. Far away from the amount of clients i want to be able to serve. Another approach i consider is to use select () and create sets. But the access time to find a socket within a set is O(N). This can be quite long if i have more than a couple of thousands clients connected. Please correct me if i am wrong.

Well, i guess i need some ideas :-)

Groetjes Markus

P.S. i tag it for C++ and C because it applies to both languages.

I'd go with a thread pool, so you won't need to create a new thread for every connection. This is both faster and avoids the maximum of 5000 connections. — Niklas B., Apr 02 '12 at 14:41
I suggest using epoll to watch the clients for activity, and then perhaps worker threads to perform the I/O and work with the data. You might find this useful: http://www.kegel.com/c10k.html — James M, Apr 02 '12 at 14:43
yes i have a thread pool. it consists of a struct with a bool is_used and a pthread_t thread_id. From that i fetch threads. — markus_p, Apr 02 '12 at 14:44
Do you mean access time to find a socket is O(N)? O(1) should be no problem :) — Eddie Edwards, Apr 02 '12 at 14:50
If your clients communicate using TCP you need an IP address for each ~64k clients. — Andreas Florath, Apr 02 '12 at 15:04
there are <64k ports.. each of these blocks you need at least a different IP. — Karoly Horvath, Apr 02 '12 at 15:33
@AndreasFlorath "you need an IP address for each ~64k clients" is not correct, see [this SO answer](http://stackoverflow.com/a/2332756/196250). — Baffe Boyois, Apr 02 '12 at 17:07
@BaffeBoyois assuming that a client might not use more than one port (which is often the case) this is correct. — Andreas Florath, Apr 02 '12 at 17:54

score 5 · Answer 1 · answered Apr 02 '12 at 14:51

5

The best approach as of today is an event loop like libev or libevent.

In most cases you will find that one thread is more than enough, but even if it isn't, you can always have multiple threads with separate loops (at least with libev).

Libev[ent] uses the most efficient polling solution for each OS (and anything is more efficient than select or a thread per socket).

answered Apr 02 '12 at 14:51

a sad dude

2,775
17
20

2

Note that libev does not support edge-triggered notifications (but epoll does). – James M Apr 02 '12 at 14:55
Yep, but you can always unset the event when you don't need it, so I don't see that as a serious problem. – a sad dude Apr 02 '12 at 14:58
what do you mean with ANYTHING is more efficient than select? – markus_p Apr 02 '12 at 14:59
Process per socket would be less efficient. – Douglas Leeder Apr 02 '12 at 15:28
Machine per socket would be less efficient :-) – Douglas Leeder Apr 02 '12 at 15:28
Just throwing this into the mix that someone once showed me: [threads-vs-events](http://www.mailinator.com/tymaPaulMultithreaded.pdf) – Ioan Apr 02 '12 at 15:52
@DouglasLeeder anything sane ) – a sad dude Apr 03 '12 at 16:04

Douglas Leeder · Answer 2 · 2012-04-02T16:00:30.403

1

You'll run into a couple of limits:

fd_set size: This is changable at compile time, but has quite a low limit by default, this affects select solutions.
Thread-per-socket will run out of steam far earlier - I suggest putting the longs calculations in separate threads (with pooling if required), but otherwise a single thread approach will probably scale.

To reach 500,000 you'll need a set of machines, and round-robin DNS I suspect.

TCP ports shouldn't be a problem, as long as the server doesn't connection back to the clients. I always seem to forget this, and have to be reminded.

File descriptors themselves shouldn't be too much of a problem, I think, but getting them into your polling solution may be more difficult - certainly you don't want to be passing them in each time.

edited Apr 02 '12 at 16:00

answered Apr 02 '12 at 15:32

Douglas Leeder

52,368
9
94
137

1. no, there's only one port on the server. – Karoly Horvath Apr 02 '12 at 15:35
@Karoly Every connection requires a port. A port is used to make connections, after which, the OS assigns a random port in the upper range for further communication. – Ioan Apr 02 '12 at 15:40
I didnt think about the TCP port problem. Upvote for mentioning this. My goal is 30 000 so its halfway through. – markus_p Apr 02 '12 at 15:46
@loan: that's client side: en.wikipedia.org/wiki/Ephemeral_port – Karoly Horvath Apr 02 '12 at 15:48
@Karoly Actually, your link seems to confirm what I said, read the sentence pertaining to server-side. This one explains things better: [TCP](http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Resource_usage) – Ioan Apr 02 '12 at 17:30
@loan: better? Which sentence do you think explains it there? note: if the server would change the port how would the client know where to send data?? – Karoly Horvath Apr 02 '12 at 17:41
@Karoly You are correct, but the whole "Resources" section in my link seems to better explain what's going on. Anyway, just a little confused given this answer's note on TCP ports... – Ioan Apr 02 '12 at 20:12
I saw, but was referring to the part about not being a problem until connecting back to the clients... I guess you mean creating a new connection, since the current connection is already two-way. – Ioan Apr 03 '12 at 19:35
@loan, yes I mean if the server wants to initiate a connection, for example if the clients only connect transiently. But I agree - unlikely. – Douglas Leeder Apr 03 '12 at 21:16

score 0 · Answer 3 · answered Apr 02 '12 at 14:54

0

I think you can use the event model(epoll + worker threads pool) to solve this problem. first listen and accept in main thread, if the client connects to the server, the main thread distribute the client_fd to one worker thread, and add epoll list, then this worker thread will handle the reqeust from the client.

the number of worker thread can be configured by the problem, and it must be no more the the 5000.

answered Apr 02 '12 at 14:54

yaronli

699
5
10

without the epoll, thats what i am doing at the moment. But i want to handle more than 5000 clients. So one thread per client is not a deal. – markus_p Apr 02 '12 at 15:49
@MarkusPfundstein Each thread would handle many clients in this solution. – Douglas Leeder Apr 02 '12 at 16:01
@MarkusPfundstein One thread can handle many clients, you do not have to handle one request per thread, and create thread or destroy thread is time consuming. you can reference the nginx implements, which is a web server and use the worker process + epoll, and can handle 10K cocurrent requets. – yaronli Apr 03 '12 at 08:03

Best approach for writing a Linux Server in C (phtreads, select or fork ? )

3 Answers3