kernel-based (Linux) data relay between two TCP sockets

Question

I wrote TCP relay server which works like peer-to-peer router (supernode).

The simplest case are two opened sockets and data relay between them:

clientA <---> server <---> clientB

However the server have to serve about 2000 such A-B pairs, ie. 4000 sockets...

There are two well known data stream relay implementations in userland (based on socketA.recv() --> socketB.send() and socketB.recv() --> socketA.send()):

using of select / poll functions (non-blocking method)
using of threads / forks (blocking method)

I used threads so in the worst case the server creates 2*2000 threads! I had to limit stack size and it works but is it right solution?

Core of my question:

Is there a way to avoid active data relaying between two sockets in userland?

It seems there is a passive way. For example I can create file descriptor from each socket, create two pipes and use dup2() - the same method like stdin/out redirecting. Then two threads are useless for data relay and can be finished/closed. The question is if the server should ever close sockets and pipes and how to know when the pipe is broken to log the fact?

I've also found "socket pairs" but I am not sure about it for my purpose.

What solution would you advice to off-load the userland and limit amount fo threads?

Some extra explanations:

The server has defined static routing table (eg. ID_A with ID_B - paired identifiers). Client A connects to the server and sends ID_A. Then the server waits for client B. When A and B are paired (both sockets opened) the server starts the data relay.
Clients are simple devices behind symmetric NAT therefore N2N protocol or NAT traversal techniques are too complex for them.

Thanks to Gerhard Rieger I have the hint:

I am aware of two kernel space ways to avoid read/write, recv/send in user space:

sendfile

splice

Both have restrictions regarding type of file descriptor.

dup2 will not help to do something in kernel, AFAIK.

Man pages: splice(2) splice(2) vmsplice(2) sendfile(2) tee(2)

Related links:

For that many connections, a combination of a few threads and [`epoll(4)`](http://linux.die.net/man/4/epoll) is probably something you should look into. — Some programmer dude, Jul 11 '13 at 10:30
That, and you could use something like [libev](http://software.schmorp.de/pkg/libev.html) — Hasturkun, Jul 11 '13 at 11:24
Thanks. However it is still active relay in userland. I believe a passive method exists. The server waits for client's ID with 5s timeout so threads seemed to be natural choice for pairing stage. — nopsoft, Jul 11 '13 at 11:48
Good short comparison: http://www.win.tue.nl/~aeb/linux/lk/lk-12.html . I just used blocking recv() in each thread. However to limit threads I can use epoll() in one thread or FASYNC. — nopsoft, Jul 15 '13 at 09:29

score 6 · Answer 1 · edited Mar 25 '19 at 10:23

6

OpenBSD implements SO_SPLICE:

relayd asiabsdcon2013 slides / paper
http://www.manualpages.de/OpenBSD/OpenBSD-5.0/man2/setsockopt.2.html
http://metacpan.org/pod/BSD::Socket::Splice .

Does Linux support something similar or only own kernel-module is the solution?

TCPSP
SP-MOD described here
TCP-Splicer described here
L4/L7 switch
HAProxy

edited Mar 25 '19 at 10:23

Mateusz Piotrowski

8,029
10
53
79

answered Jul 15 '13 at 12:07

nopsoft

922
7
10

score 3 · Answer 2 · answered Jul 15 '13 at 13:47

Even for loads as tiny as 2000 concurrent connections, I'd never go with threads. They have the highest stack and switching overhead, simply because it's always more expensive to ensure that you can be interrupted anywhere than when you can only be interrupted at specific places. Just use epoll() and splice (if your sockets are TCP, which seems to be the case) and you'll be fine. You can even make epoll work in event triggered mode, where you only register your fds once.

If you absolutely want to use threads, use one thread per CPU core to spread the load, but if you need to do this, it means you're playing at speeds where affinity, RAM location on each CPU socket etc... plays a significant role, which doesn't seem to be the case in your question. So I'm assuming that a single thread is more than enough in your case.

Thanks. Better to use EPOLLONESHOT or ADD/DEL ? Like here http://stackoverflow.com/questions/4173024/question-about-epoll-and-splice or http://rg4.net/archives/375.html ? — nopsoft, Jul 15 '13 at 14:22
Never tried EPOLLONESHOT, though it can be useful and maybe an elegant alternative to EPOLLET. Start with the standard ADD/DEL to limit the complexity I think and try to optimize later if needed. — Willy Tarreau, Aug 19 '13 at 21:55

kernel-based (Linux) data relay between two TCP sockets

2 Answers2

Linked