Kernel bypass for UDP and TCP on Linux- what does it involve?

Question

Per http://www.solacesystems.com/blog/kernel-bypass-revving-up-linux-networking:

[...]a network driver called OpenOnload that use “kernel bypass” techniques to run the application and network driver together in user space and, well, bypass the kernel. This allows the application side of the connection to process many more messages per second with lower and more consistent latency.

[...]

If you’re a developer or architect who has fought with context switching for years kernel bypass may feel like cheating, but fortunately it’s completely within the rules.

What are the functions needed to do such kernel bypassing?

there's no other way other than to write your own network driver for the NIC attached to you rcomputer. — JosephH, Mar 29 '13 at 11:56
@JosephH are you saying kernel bypass is "driver bypass"? You will need a "driver" at some point? Or do we have to re-write the driver outside of the kernel? — user997112, Mar 29 '13 at 11:58
How do you think you can manipulate the NIC directly without ever going through the kernel? — JosephH, Mar 29 '13 at 12:02
I can't see why this question was closed. The question is clear and of value and the mentioned situation isn't "extraordinarily narrow". — oberstet, Dec 04 '13 at 18:15
Linked from: http://networkengineering.stackexchange.com/a/16212/5654 — Pacerier, May 13 '16 at 05:37
@user997112, These threads might have some leads: http://www.theregister.co.uk/2015/10/12/linux_networking_api_showing_its_age/ , http://highscalability.com/blog/2014/2/13/snabb-switch-skip-the-os-and-get-40-million-requests-per-sec.html , http://www.whatisnetworking.net/tag/linux-kernel-bypass-for-network/ , http://superuser.com/q/635199/78897 , http://rhelblog.redhat.com/2015/10/02/getting-the-best-of-both-worlds-with-queue-splitting-bifurcated-driver/ , and [CloudFlare-KernelBypass](https://blog.cloudflare.com/kernel-bypass/#kernelbypasstotherescue). — Pacerier, May 13 '16 at 06:13
Also related: http://ttthebear.blogspot.sg/2008/07/linux-kernel-bypass-and-performance.html , https://compsmusicandstuffs.wordpress.com/2011/06/22/bypassing-the-linux-kernel-coding-your-very-own-root-kit/#post-100 , http://www.whatisnetworking.net/tag/linux-kernel-bypass-for-network/ , http://www.bbc.co.uk/rd/blog/2015/10/streaming-video-on-10-gigabit-ethernet-and-beyond , http://rhelblog.redhat.com/2015/10/02/getting-the-best-of-both-worlds-with-queue-splitting-bifurcated-driver/ , http://www.eecs.berkeley.edu/~sangjin/2013/01/14/NUSE.html#longing-for-a-network-equivalent-to-fuse , — Pacerier, May 13 '16 at 07:09
https://github.com/lukego/blog/issues/13, https://www.infoq.com/interviews/riddoch-kernel-bypass-solarflare and the GoogleTechTalks linked from http://www.openonload.org/ — Pacerier, May 13 '16 at 07:09

score 2 · Answer 1 · answered Mar 29 '13 at 12:17

2

A TCP offload engine will "just work", no special application programming needed. It doesn't bypass the whole kernel, it just moves some of the TCP/IP stack from the kernel to the network card, so the driver is slightly higher level. The kernel API is the same.

TCP offload engine is supported by most modern gigabit interfaces.

Alternatively, if you mean "running code on a SolarFlare network adapter's embedded processor/FPGA 'Application Onload Engine'", then... that's card-specific. You're basically writing code for an embedded system, so you need to say which kind of card you're using.

answered Mar 29 '13 at 12:17

user9876

10,954
6
44
66

1

Its good you mention this because I was wondering what is the difference between TOE and kernel bypass? I thought TOE was about on-card processing whereas kernel bypass was implementing kernel features in the userspace and customising them? – user997112 Mar 29 '13 at 13:09
So... what IS "kernel bypass"? Absent any references, I'm going to vote to close the question as unclear. – user9876 Mar 29 '13 at 18:51
@user9876, "kernel bypass" is, well, techniques to bypass the kernel. Someone wrote an "exact" definition here if you wish to drill: "[*Kernel Bypass, also called OS bypass, is a concept to improve the network performance, by going "around" the kernel or OS. Hence the term, "bypass." In a typical system, the kernel decodes the network packet, most likely TCP, and passes the data from the kernel space to user space by copying it.*](http://ttthebear.blogspot.sg/2008/07/linux-kernel-bypass-and-performance.html#post-body-136740617070080811) – Pacerier May 13 '16 at 06:44
*[\[cont\] This process means the user space process context data must be saved and the kernel context data must be loaded. This step of saving the user process information and then loading the kernel process information is known as a context switch.](http://ttthebear.blogspot.sg/2008/07/linux-kernel-bypass-and-performance.html#post-body-136740617070080811)*" – Pacerier May 13 '16 at 06:44
("[TCP Offload Engines (TOE) are a somewhat controversial concept in the Linux world.](http://ttthebear.blogspot.sg/2008/07/linux-kernel-bypass-and-performance.html#post-body-136740617070080811)") **Citation needed for "*TCP offload engine is supported by most modern gigabit interfaces*"**. – Pacerier May 13 '16 at 06:47

Skeen · Accepted Answer · 2013-03-29T12:14:58.133

Okay, so the question is not straight forward to answer without knowing how the kernel handles the network stack.

In generel the network stack is made up of a lot of layers, with the lowest one being the actual hardware, typically this hardware is supported by means of drivers (one for each network interface), the nic's typically provide very simple interfaces, think recieve and send raw data.

On top of this physical connection, with the ability to recieve and send data is a lot of protocols, which are layered as well, near the bottem is the ip protocol, which basically allows you to specify the reciever of your information, while at the top you'll find TCP which supports stable connections.

So in order to answer your question, you most first figure out which part of the network stack you'll need to replace, and what you'll need to do. From my understanding of your question it seems like you'll want to keep the original network stack, and then just sometimes use your own, and in that case you should really just implement the strategy pattern, and make it possible to state which packets should be handled by which toplevel of the network stack.

Depending on how the network stack is implemented in linux, you may or may not be able to achieve this, without kernel changes. In a microkernel architecture, where each part of the network stack is implemented in its own service, this would be trivial, as you would simply pipe your lower parts of the network stack to your strategy pattern, and have this pipe the input to the required network toplevel layers.

So what you're saying is, kernel bypass in this instance means writing your own network stack in the user space? I would go into the OS source code and, at the line(s) where the packets are being sent to the kernel network stack, I would send them directly to my "application" in the user space? — user997112, Mar 29 '13 at 12:11
Doing this would ruin overall network connectivity, such that only your application has "internet access", I would rather suggest that only packages you need, be directed to your app. And you should be able to reuse much of the current network stack, depending on what your trying to achieve, but if your simply interrested in the raw packages, just setup a character device driver, which reads the nics buffer directly, without interfering, or hook into the nic driver and replace its service calls with your own, which dumps information and then forwards calls to the nic itself. — Skeen, Mar 29 '13 at 12:18

score 1 · Answer 3 · answered Mar 29 '13 at 12:21

Do you perhaps want to send and recieve raw IP packets?

Basically you will need to fill in headers and data in a ip-packet. There are some examples here on how to send raw ethernet packets: :http://austinmarton.wordpress.com/2011/09/14/sending-raw-ethernet-packets-from-a-specific-interface-in-c-on-linux/

To handle TCP/IP on your own, i think that you might need to disable the TCP driver in a custom kernel, and then write your own user space server that reads raw ip.

It's probably not that efficient though...

Kernel bypass for UDP and TCP on Linux- what does it involve?

3 Answers3

Linked