I'm using TCP sockets to do local IPC between applications running on the same machine.
The data that is exchanged in this IPC is very simple: The client sends a 4-byte request, the server sends a 128kB response, this happens basically in a tight loop.
Now if I open the connection to 127.0.0.1 or ::1, I get the expected shortcut through loopback, as described in this question and the delay is very low.
If I open the connection to 192.168.0.2 (the local v4 address of the machine), the same thing happens. The network stack seems smart enough to detect that it's the same machine and the latency is very low.
Now if I open the connection to any of the global v6 addresses of my machine (doesn't matter if temporary or not), the latency increases immensely if I'm on OSX (10.11.3), however not on Linux. It seems like Linux is smart enough to detect it could be using loopback and speeds things up, while OSX can't do that with the v6 address.
To summarise:
- 127.0.0.1: fast everywhere
- local v4 address: fast everywhere
- local v6 address: fast on Linux, slow on OSX
Is this a known defect of OSX, or is Linux doing something it shouldn't be doing?
EDIT: It does not make a difference whether I use the TCP_NODELAY option or not.