21

I'm seeing a couple strange things with a pair of AF_UNIX sockets created by a call such as:

 socketpair(AF_UNIX, SOCK_STREAM, 0, sfd); 

Where sfd is an int[2] array for the file descriptors.

First, the default buffer size seems to be exactly 122K (124928 bytes), rather than anything from /proc/sys/net (such as wmem_default which is set to 128K). Does anyone know the cause of this strange buffer size?

Second, when writing small messages through the socket (8 bytes). I can only write 423 of them before the write blocks, which is only 8*423 = 3384 bytes, another odd size. The messages are acting as though they're taking up 295 + a little bytes each. What's the source of this overhead?

Running on RHEL6 (2.6.32, 64-bit)

I wrote a program to try different sizes of data to compare overhead costs:

#include <errno.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#define DATA_SIZE 4

void run(size_t size) {
    int sfd[2];
    if (socketpair(AF_UNIX, SOCK_STREAM, 0, sfd) == -1) {
        perror("error");
    }


    int sndbuf, sbsize = sizeof(sndbuf);
    getsockopt(sfd[0], SOL_SOCKET, SO_SNDBUF, &sndbuf, (socklen_t*)&sbsize);

    printf("Data Size: %zd\n", size);
    char buff[size];   
    size_t wrote=0;
    for (size_t ii=0; ii < 32768; ii++) {
        if ((send(sfd[0], buff, size, MSG_DONTWAIT) == -1) && (errno == EAGAIN)) {
            wrote = ii;
            break;
        }
    }

    printf("Wrote:     %zd\n", wrote);

    if (wrote != 0) { 
        int bpm = sndbuf/wrote;
        int oh  = bpm - size;

        printf("Bytes/msg: %i\n",  bpm);
        printf("Overhead:  %i\n",  oh);
        printf("\n");
    }

    close(sfd[0]); close(sfd[1]);
}

int main() {
    int sfd[2];
    socketpair(AF_UNIX, SOCK_STREAM, 0, sfd);

    int sndbuf, sbsize = sizeof(sndbuf);
    getsockopt(sfd[0], SOL_SOCKET, SO_SNDBUF, &sndbuf, (socklen_t*)&sbsize);

    printf("Buffer Size: %i\n\n", sndbuf);
    close(sfd[0]); close(sfd[1]);

    for (size_t ii=4; ii <= 4096; ii *= 2) {
        run(ii);
    }
}

Which gives:

Buffer Size: 124928

Data Size: 4
Wrote:     423
Bytes/msg: 295
Overhead:  291

Data Size: 8
Wrote:     423
Bytes/msg: 295
Overhead:  287

Data Size: 16
Wrote:     423
Bytes/msg: 295
Overhead:  279

Data Size: 32
Wrote:     423
Bytes/msg: 295
Overhead:  263

Data Size: 64
Wrote:     423
Bytes/msg: 295
Overhead:  231

Data Size: 128
Wrote:     348
Bytes/msg: 358
Overhead:  230

Data Size: 256
Wrote:     256
Bytes/msg: 488
Overhead:  232

Data Size: 512
Wrote:     168
Bytes/msg: 743
Overhead:  231

Data Size: 1024
Wrote:     100
Bytes/msg: 1249
Overhead:  225

Data Size: 2048
Wrote:     55
Bytes/msg: 2271
Overhead:  223

Data Size: 4096
Wrote:     29
Bytes/msg: 4307
Overhead:  211

Versus using a pipe there's definitely a lot of overhead:

Data Size: 4
Wrote:     16384
Bytes/msg: 4
Overhead:  0

Data Size: 8
Wrote:     8192
Bytes/msg: 8
Overhead:  0

Data Size: 16
Wrote:     4096
Bytes/msg: 16
Overhead:  0

Data Size: 32
Wrote:     2048
Bytes/msg: 32
Overhead:  0

Data Size: 64
Wrote:     1024
Bytes/msg: 64
Overhead:  0

Data Size: 128
Wrote:     512
Bytes/msg: 128
Overhead:  0

Data Size: 256
Wrote:     256
Bytes/msg: 256
Overhead:  0

Data Size: 512
Wrote:     128
Bytes/msg: 512
Overhead:  0

Data Size: 1024
Wrote:     64
Bytes/msg: 1024
Overhead:  0

Data Size: 2048
Wrote:     32
Bytes/msg: 2048
Overhead:  0

Data Size: 4096
Wrote:     16
Bytes/msg: 4096
Overhead:  0
gct
  • 14,100
  • 15
  • 68
  • 107
  • send() returns the number of bytes actually written. You should be totalling these, not just assuming it was all written. – user207421 Jun 06 '12 at 15:24
  • 1
    Worst case I'll have written less than what I'm claiming, which would make the overhead on the domain socket even worse. – gct Jun 06 '12 at 15:55

2 Answers2

7

Take a look at the socket(7) man page. There is a section that reads:

SO_SNDBUF Sets or gets the maximum socket send buffer in bytes. The kernel doubles this value (to allow space for bookkeeping overhead) when it is set using setsockopt(2), and this doubled value is returned by getsockopt(2). The default value is set by the /proc/sys/net/core/wmem_default file and the maximum allowed value is set by the /proc/sys/net/core/wmem_max file. The minimum (doubled) value for this option is 2048.

So it appears that the overhead is simply to hold bookkeeping information for the Kernel.

Chimera
  • 5,884
  • 7
  • 49
  • 81
  • I'm not even sure that applies to the local sockets, and a simple halving of the available buffer space still wouldn't account for all the overhead I'm seeing. – gct Jun 08 '12 at 14:36
  • 1
    The man page doesn't distinguish between AF_UNIX or the non local domains so I'm assuming it applies across the board. That's all the documentation I've been able to find regarding the situation. I suspect if you need to know "exactly" what the overhead is used for you will have to take a look at the kernel networking code. – Chimera Jun 08 '12 at 15:54
  • 2
    I haven't accepted this answer because I think even with the factor-of-two overhead, I'm still seeing way too much per-message. Even if the kernel was only letting me use 62464 bytes, then I should be able to write 15000+ messages before filling the buffer, and I'm only seeing 1/30 of that. – gct Jun 15 '12 at 16:27
  • You will need to read the Kernel networking code to get all the details of what is done with the overhead as I mentioned before. I'm just pointing out what the man pages say about overhead. – Chimera Jun 15 '12 at 16:54
0

Have you looked at the value of the net.unix.max_dgram_qlen sysctl?

The kernel imposes a limit on the maximum number of in-flight AF_UNIX datagrams. On my system the limit is actually very low: only 10.

Kristof Provost
  • 26,018
  • 2
  • 26
  • 28
  • I wasn't aware of that, no. Would that apply here though, since I'm u sing a SOCK_STREAM type? – gct Jun 06 '12 at 16:06
  • 1
    No, that should only apply to datagram sockets, at least on the kernel version I'm looking at. – Kristof Provost Jun 06 '12 at 16:09
  • 1
    In fact, I can't see why a unix datagram socket write would ever short if hitting wmem_max. – Kristof Provost Jun 06 '12 at 16:15
  • I can say that increasing `net.unix.max_dgram_qlen` doesn't change the overhead. My system has a default value of 10, I change it to 200 without any change in the programs output or calculated overhead. – Chimera Jun 07 '12 at 21:16
  • 1
    Not so much wrong as besides the point. Still, you're right. Please stop upvoting me ;) – Kristof Provost Jun 08 '12 at 19:40