I’m working on a packet reshaping project in Linux using the BeagleBone Black. Basically, packets are received on one VLAN, modified, and then are sent out on a different VLAN. This process is bidirectional - the VLANs are not designated as being input-only or output-only. It’s similar to a network bridge, but packets are altered (sometimes fairly significantly) in-transit.
I’ve tried two different methods for accomplishing this:
- Creating a user space application that opens raw sockets on both interfaces. All packet processing (including bridging) is handled in the application.
- Setting up a software bridge (using the kernel bridge module) and adding a kernel module that installs a netfilter hook in post routing (NF_BR_POST_ROUTING). All packet processing is handled in the kernel.
The second option appears to be around 4 times faster than the first option. I’d like to understand more about why this is. I’ve tried brainstorming a bit and wondered if there is a substantial performance hit in rapidly switching between kernel and user space, or maybe something about the socket interface is inherently slow?
I think the user application is fairly optimized (for example, I’m using PACKET_MMAP), but it’s possible that it could be optimized further. I ran perf on the application and noticed that it was spending a good deal of time (35%) in v7_flush_kern_dcache_area, so perhaps this is a likely candidate. If there are any other suggestions on common ways to optimize packet processing I can give them a try.