11

I'd like to know the reasons for why the linux kernel (or any other mainstream OS) does not have a feature for zero copy networking ? By zero copy I mean, that an packet/datastream does not get copied for passing to an application in userspace but e.g. uses a memory-pool type of allocator to share the memory between kernel and userspace. I've came up with 3 theory's on my own:

a) I guess there are security concerns. But is there really no way of making memory shared securily between userspace and kernel when they are just used as a buffer ?

b) I guess there are stability concerns. But can't we assume that whoever uses zero-copy networking and e.g. needs to instanciate and pass a memory-pool for the kernel call is aware of memory management? Aware enough to avoid leaks ?

c) It just haven't been done/needed so far. I can't really imagine that nobody requested this feature, as everybody who is using small packet sizes is typically bottlenecked by the "slow" TCP-stack implementation and there are 3rd party tools out there offered for 0-copy networking for usage with special network cards.

Feel free to post any guesses, but please mark whether you are assuming or have a deeper knowledge of the reasons to keep StackOverflow-quality :-)

user1610743
  • 840
  • 8
  • 24
  • 3
    Well, Linux does have that functionality, problem is just that a) some things are not trivial to get right, but they implemented _something_ before thinking of these details and b) the documenation is piss poor, to say the least. If you're interested, look into `splice`, `tee`, and `vmsplice`. These let you copy between sockets, files, and pipes (kernel buffers), duplicate pipes (refcount on pages), and copy between user and kernel space by remapping pages. So much for the nice theory, in practice you end up with pages which you may not touch and don't know when to free... and bleh. – Damon Mar 03 '14 at 17:08
  • 2
    Check out the source for haproxy. It uses `splice` and pipes to prevent copying through userspace (the source is a little rough to go through, due to the event loop and using a pool of pipes). I heard the author of haproxy is working on a kernel patch for directly splicing sockets, but isn't sure its going to actually be faster. – JimB Mar 03 '14 at 21:47
  • See also [RoCE](https://en.wikipedia.org/wiki/RDMA_over_Converged_Ethernet) – Nemo Oct 24 '18 at 20:22

1 Answers1

9

There are a few options nowadays for zero-copy networking:

Note, that zero-copy with TCP may be inconvenient, because TCP segments carry headers and payload, but applications are only concerned with payload, so that payload but not headers must be copied into a contiguous buffer for your application.

Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
  • Segment headers may be a valid reason why its not standard yet. However, I wouldn't mind if my application can see them or not and I can't see a reason for the payload copy, as it takes performance away from my application. +1 – user1610743 Mar 03 '14 at 15:28
  • @user1610743 In C++ there could be an out-of-the-box iterator that skips over the frame headers and iterates over payload only. – Maxim Egorushkin Nov 24 '14 at 18:54
  • @PeterCordes Updated the links. – Maxim Egorushkin Jan 09 '20 at 16:44