In a nutshell, SO_REUSEPORT
socket option allow to create multiple sockets on ip:port pair. For example, program1
and program2
both can call functions chain socket()->bind()->listen()->accept()
for the same port and IP, and kernel scheduler will evenly distribute incoming connections between this two programs.
I assumed that with this option, you can get rid of the use of fork()
for spawning additional workers and can simply run new program instance.
I wrote a simple epoll socket server, based on this logic, and test it with weighttp:
weighttp -n 1000000 -c 1000 -t 4 http://127.0.0.1:8080/
For two running instances the results is ~44000 RPS, for one running instance - near ~51000 RPS. I am very surprised with 7000 RPS differece.
After this test I add fork()
before listen()
and run one instance of server, so now it has the same logic that previous implementation - two process with epoll loop listening socket, but socket()->bind()
called only once, before fork()
, and second process receive it FD copy for listen()
call.
I run tests again and it shows ~50000 RPS!
So, my question is very simple: what magic do fork()
in this case and why it works faster than two independent process with it own socket()
? Kernel do the same job for scheduling, I dont see any important difference.