24

Based on my understanding, each socket is associated with two buffers, a send buffer and a receive buffer, so when I call the send() function, what happens is that the data to send will be placed into the send buffer, and it is the responsibility of Windows now to send the content of this send buffer to the other end.

In a blocking socket, the send() function does not return until the entire data supplied to it has been placed into the send buffer.

So what is the size of the send buffer?

I performed the following test (sending 1 GB worth of data):

#include <stdio.h>

#include <WinSock2.h>
#pragma comment(lib, "ws2_32.lib")

#include <Windows.h>

int main()
{
    // Initialize Winsock
    WSADATA wsa;
    WSAStartup(MAKEWORD(2, 2), &wsa);

    // Create socket
    SOCKET s = socket(AF_INET, SOCK_STREAM, 0);

    //----------------------

    // Connect to 192.168.1.7:12345
    sockaddr_in address;
    address.sin_family = AF_INET;
    address.sin_addr.s_addr = inet_addr("192.168.1.7");
    address.sin_port = htons(12345);
    connect(s, (sockaddr*)&address, sizeof(address));

    //----------------------

    // Create 1 GB buffer ("AAAAAA...A")
    char *buffer = new char[1073741824];
    memset(buffer, 0x41, 1073741824);

    // Send buffer
    int i = send(s, buffer, 1073741824, 0);

    printf("send() has returned\nReturn value: %d\nWSAGetLastError(): %d\n", i, WSAGetLastError());

    //----------------------

    getchar();
    return 0;
}

Output:

send() has returned
Return value: 1073741824
WSAGetLastError(): 0

send() has returned immediately, does this means that the send buffer has a size of at least 1 GB?

This is some information about the test:

  • I am using a TCP blocking socket.
  • I have connected to a LAN machine.
  • Client Windows version: Windows 7 Ultimate 64-bit.
  • Server Windows version: Windows XP SP2 32-bit (installed on Virtual Box).

Edit: I have also attempted to connect to Google (173.194.116.18:80) and I got the same results.

Edit 2: I have discovered something strange, setting the send buffer to a value between 64 KB and 130 KB will make send() work as expected!

int send_buffer = 64 * 1024;    // 64 KB
int send_buffer_sizeof = sizeof(int);
setsockopt(s, SOL_SOCKET, SO_SNDBUF, (char*)send_buffer, send_buffer_sizeof);

Edit 3: It turned out (thanks to Harry Johnston) that I have used setsockopt() in an incorrect way, this is how it is used:

setsockopt(s, SOL_SOCKET, SO_SNDBUF, (char*)&send_buffer, send_buffer_sizeof);

Setting the send buffer to a value between 64 KB and 130 KB does not make send() work as expected, but rather setting the send buffer to 0 makes it block (this is what I noticed anyway, I don't have any documentation for this behavior).

So my question now is: where can I find a documentation on how send() (and maybe other socket operations) work under Windows?

  • If no error occurs, send returns the total number of bytes sent, which can be less than the number requested to be sent in the len parameter. https://msdn.microsoft.com/en-us/library/windows/desktop/ms740149(v=vs.85).aspx – Richard Critten Feb 28 '15 at 19:14
  • 3
    You can discover the size for yourself with `getsockopt()`. And how do you know it returned immediately? That code can't tell you that. – user207421 Feb 28 '15 at 20:27
  • You can tell it returned (error or no error) immediately because he sent a 1GB buffer. Unless you have SUPER fast internet, this would take him quite a long time. Most likely it failed. – Brandon Feb 28 '15 at 21:13
  • @Brandon You cannot know from this code when the send starts and finishes. There is also a `connect()` in there, that could take appreciable time. He isn't using the Internet, he's using a LAN. – user207421 Feb 28 '15 at 21:21
  • @EJP Actually, I connected to Google (173.194.116.18:80), and `send()` has also returned immediately without errors. –  Feb 28 '15 at 22:22
  • @EJP See **Edit 2** in my question. –  Mar 01 '15 at 00:47
  • So what happened when you called `getsockopt()` as suggested? What value did you get? Clearly a large one. – user207421 Mar 01 '15 at 00:51
  • @EJP Actually, I got 8192 bytes (8 KB). –  Mar 01 '15 at 01:10
  • I can't repro with .NET. The `Socket.Send` call blocks. And the process becomes unkillable :(. This was on a physical machine sending to Google. System memory usage measured by the Commit Charge did not increase when the `Send` call started. – usr Mar 01 '15 at 22:14
  • Retrying with 2GB and under memory pressure I see that all pages of the buffer are touched when the `Send` starts. The working set contains all 2GB. It looks like Windows is simply pinning the pages into memory and the TCP stack working off of those pages. No additional buffer. This is probably a heuristic decision depending on the send size and the configured buffer sizes. – usr Mar 01 '15 at 22:25
  • @usr: it isn't really surprising that .NET behaves differently. – Harry Johnston Mar 01 '15 at 23:57
  • @EJP: it turns out that this buffering is disabled if you explicitly set the buffer size to zero; do you happen to know whether setting a buffer size of zero has a well-defined meaning in Posix? – Harry Johnston Mar 02 '15 at 00:41
  • @HarryJohnston why is that? All .NET does is call the native APIs wrapped nicely in managed libraries. Do you know of any significant difference to the code shown here? – usr Mar 02 '15 at 11:32
  • @usr: at a guess, the .NET libraries are using the Windows-specific TCP functions (probably in asynchronous mode) rather than the Posix-like send(). – Harry Johnston Mar 02 '15 at 20:31
  • @HarryJohnston I think zero send buffer is only a Windows-ism. Posix has wording that lets the platform impose a minimum size. – user207421 Mar 02 '15 at 21:14

3 Answers3

18

After investigating on this subject. This is what I believe to be the correct answer:

When calling send(), there are two things that could happen:

  • If there are pending data which are below SO_SNDBUF, then send() would return immediately (and it does not matter whether you are sending 5 KB or you are sending 500 MB).

  • If there are pending data which are above or equal SO_SNDBUF, then send() would block until enough data has been sent to restore the pending data to below SO_SNDBUF.

Note that this behavior is only applicable to Windows sockets, and not to POSIX sockets. I think that POSIX sockets only use one fixed sized send buffer (correct me if I'm wrong).


Now back to your main question "What is the size of a socket send buffer in Windows?". I guess if you have enough memory it could grow beyond 1 GB if necessary (not sure what is the maximum limit though).

Tom
  • 1,344
  • 9
  • 27
6

I can reproduce this behaviour, and using Resource Monitor it is easy to see that Windows does indeed allocate 1GB of buffer space when the send() occurs.

An interesting feature is that if you do a second send immediately after the first one, that call does not return until both sends have completed. The buffer space from the first send is released once that send has completed, but the second send() continues to block until all the data has been transferred.

I suspect the difference in behaviour is because the second call to send() was already blocking when the first send completed. The third call to send() returns immediately (and 1GB of buffer space is allocated) just as the first one did, and so on, alternating.

So I conclude that the answer to the question ("how large are the send buffers?") is "as large as Windows sees fit". The upshot is that, in order to avoid exhausting the system memory, you should probably restrict blocking sends to no more than a few hundred megabytes.

Your call to setsockopt() is incorrect; the fourth argument is supposed to be a pointer to an integer, not an integer converted to a pointer. Once this is corrected, it turns out that setting the buffer size to zero causes send() to always block.

To summarize, the observed behaviour is that send() will return immediately provided:

  • there is enough memory to buffer all the provided data
  • there is not a send already in progress
  • the buffer size is not set to zero

Otherwise, it will return once the data has been sent.

KB214397 describes some of this - thanks Hans! In particular it describes that setting the buffer size to zero disables Winsock buffering, and comments that "If necessary, Winsock can buffer significantly more than the SO_SNDBUF buffer size."

(The completion notification described does not quite match up to the observed behaviour, depending I guess on how you interpret "previously buffered send". But it's close.)

Note that apart from the risk of inadvertently exhausting the system memory, none of this should matter. If you really need to know whether the code at the other end has received all your data yet, the only reliable way to do that is to get it to tell you.

Harry Johnston
  • 35,639
  • 6
  • 68
  • 158
  • Unfortunately I don't think that this is the correct answer. I have tried sending only a 2 MB buffer, and `send()` has also returned immediately (before the 2 MB were sent to the other side, so it is not that it returned immediately because I have a fast connection!). As for the sending of a second large buffer, it is not that it is a large buffer, I have attempted to send a second buffer of only 1 KB in size after the 1 GB buffer and I experienced the same behavior you described (what appears that the second `send()` has blocked). –  Mar 01 '15 at 22:56
  • What makes all of this very strange is what I said in my question: setting the send buffer to a value between 64 KB and 130 KB will make `send()` work as expected, even in the case of the sending of a second buffer (`send()` will block until the first buffer is sent, and will also block until the second buffer is sent). –  Mar 01 '15 at 22:58
  • I think that we need to know for sure how Windows handle `send()` (and maybe other socket related operations). So when we create programs that uses sockets, we don't get unpredictable behavior. –  Mar 02 '15 at 00:21
  • Yes, you are right. I forgot to put an `&` before `send_buffer` in `(char*)send_buffer`! I will test the code again. –  Mar 02 '15 at 00:30
  • You are right, nice discovery! Looks like the memory values at addresses 65536 (64 KB) to 133120 (130 KB) are `0`. But is this how `send()` works under other operating systems? And does Microsoft provides documentation of how all of this works if this is not standard behavior? –  Mar 02 '15 at 00:45
  • Also, what is the point of the send buffer in this case (if Windows does not use it)! –  Mar 02 '15 at 00:48
  • Do you mean, what's the point of being able to specify a buffer size if Windows ignores that size? Backwards and/or standards compatibility, I expect. Windows lets you set a buffer size as per specification, and it respects the single case that matters, everything else is all implementation detail. Arguably, anyway. :-) – Harry Johnston Mar 02 '15 at 20:53
  • @Harry Johnston "and it respects the single case that matters" you mean the single case where the send buffer size is set to `0`? –  Mar 02 '15 at 21:46
  • "An interesting feature is that if you do a second send immediately after the first one, that call does not return until both sends have completed" I have noticed that this only applies if the buffer size of the first `send()` is above or equal 18 KB, however if for example the buffer size of the first `send()` is 15 KB then the second `send()` will not block (note that this will not apply to the third and fourth `send()` for example, but only to the first and second `send()`). –  Mar 02 '15 at 23:39
  • Unsurprising; for sends of that size, by the time the second call is made the data has probably already left winsock and reached the ethernet drivers - or perhaps somewhere inbetween, I'm not sure exactly what the driver stack looks like. From winsock's point of view, the first send is complete. – Harry Johnston Mar 03 '15 at 00:38
  • "The third call to send() returns immediately" I have tested it, and the third call did not return immediately, it returned when the third call finished sending. However, when I put `Sleep(20);` right before the third call, it did return immediately. –  Mar 03 '15 at 08:36
  • @joseph_m: race condition, presumably. It behaved as described on my system. The second call may be completing as soon as the amount of data remaining in the buffers drops before the configured buffer size, as per the link Hans provided. Whether that last bit of data gets out of winsock's internal buffer before the third call starts would then be a matter of chance, though probably influenced by hardware factors including the ethernet manufacturer and network speed/load. – Harry Johnston Mar 03 '15 at 19:05
0

In a blocking socket, the send() function does not return until the entire data supplied to it has been placed into the send buffer.

That is not guaranteed. If there is available buffer space, but not enough space for the entire data, the socket can (and usually will) accept whatever data it can and ignore the rest. The return value of send() tells you how many bytes were actually accepted. You have to call send() again to send the remaining data.

So what is the size of the send buffer?

Use getsockopt() with the SO_SNDBUF option to find out.

Use setsockopt() with the SO_SNDBUF option to specify your own buffer size. However, the socket may impose a max cap on the value you specify. Use getsockopt() to find out what size was actually assigned.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • This is not correct. A blocking mode socket will block until all the data is transferred into the socket send buffer. Source: Posix. – user207421 Feb 28 '15 at 21:22
  • See **Edit 2** in my question. –  Mar 01 '15 at 00:49
  • @EJP: And that is not entirely correct, either. If the message is too long to be passed to the underlying protocol atomically, it shall fail. So, it _might_ block, but not necessarily. (Which is what I guess is happening here, I think this 1G-send just failed. Only funny thing is that no error is reported, but quite possibly Winsock is buggy.) – Damon Mar 01 '15 at 00:59
  • @Damon I do not think that `send()` has failed. I got the entire 1 GB at the other end (it returned immediately but kept sending!). I have discovered a strange "solution" to make `send()` work as expected, see **Edit 2** in my question. –  Mar 01 '15 at 01:24
  • @joseph_m: if you put non-repeating data in the buffer, can you confirm that it arrives intact at the far end? My guess is that by specifying such a ridiculously large number you've confused Winsock into sending random data, or the first part of the data repeatedly, or something like that. – Harry Johnston Mar 01 '15 at 03:35
  • @Remy: in this case, the return value indicated that all 1GB of data was successfully sent. So I don't think that's it. – Harry Johnston Mar 01 '15 at 03:38
  • @Harry Johnston Even with only 10 MB, it still returns immediately with success. –  Mar 01 '15 at 03:48
  • @joseph_m: a 10MB buffer doesn't seem unreasonable, though. (Even if Winsock is reporting 8kb, that might only be the first level of buffering.) Whereas I find a 1GB buffer hard to believe, though I suppose Windows might be counting on being able to swap it to disk if necessary. Can you quantify "immediately" more precisely, e.g., are we talking about 1ms, 10ms, 100ms? (Even just copying 1GB of data from your buffer into a system buffer should take a measurable amount of time.) – Harry Johnston Mar 01 '15 at 03:52
  • @Harry Johnston It is about 1 second. Didn't you tried it? –  Mar 01 '15 at 03:56
  • It is unlikely that the kernel is transmitting 1GB of data at one time, even if it is able to accept 1GB into its buffer, and even less likely that the receiver is able to received 1GB of data at one time. Resilient socket code must be able to loop sending/reading to handle smaller buffer sizes. – Remy Lebeau Mar 01 '15 at 04:01
  • @joseph_m: You say the server is a virtual machine - does that mean there is no physical ethernet connection involved? Because 1GB in 1s over a *virtual* network is entirely plausible. (Also a 10Gbs physical network, I suppose, but I figured you probably didn't have one of those.) In that case, I'd guess that it is the server that is buffering the data. It makes sense for Windows to buffer as much incoming data as it can, since the only other choice is discard it. – Harry Johnston Mar 01 '15 at 04:06
  • @Harry Johnston I said in **Edit** in the question that I have also attempted to connect to Google (173.194.116.18:80) and I got the same results (i.e. `send()` returned immediately with success). –  Mar 01 '15 at 04:08
  • I have some client/server code that could probably be adapted to try this out. If I get a chance at work tomorrow I'll see what happens. – Harry Johnston Mar 01 '15 at 04:10
  • I'd be surprised if Google's servers accepted you pushing a gigabyte of data though. My expectation would be that after, say, 64kB or so which don't look like legitimate HTTP request headers, the server simply drops the connection (or puts your IP address onto the DROP chain). I would also be very surpised if you had a 10Gps internet uplink at your place... – Damon Mar 01 '15 at 12:58
  • 1
    @Damon it is not that Google has accepted a 1 GB of data (of course it didn't!). It is that Windows seems to placing the 1 GB of data immediately in the send buffer. Try my code on your machine and see what happens. –  Mar 01 '15 at 18:12
  • @Damon, I can reproduce this behaviour - see my answer. – Harry Johnston Mar 01 '15 at 21:50
  • @Damon That is all nonsense about 'atomically' and 'it shall fail'. TCP is a streaming protocol: it segments transmissions into MTUs, and IP can fragment as well at the next layer down. No 'shall fail' about it. – user207421 Mar 02 '15 at 21:18
  • @EJP: Please tell that the POSIX authors as well as the Linux manpage authors. – Damon Mar 02 '15 at 23:37
  • @Damon: by "atomically" in this context, do you just mean that the data from one call to send() can't be interspersed with the data from another? I think it should be possible to achieve that without placing any size limit on the send() call. – Harry Johnston Mar 03 '15 at 00:46
  • @Damon There was a discussion on news:comp.protocols.tcp-ip, where all the implementors live, some years ago, concerning this very issue, and the unanimous consensus was as I have stated, *because* it is mandated by Posix. If you wish to debate this further, i.e. if you are claiming that, for example, a blocking-mode write of 10k bytes will return a short count because it is beyond any known MTU, you are mistaken: please try it. – user207421 Mar 03 '15 at 09:22
  • @HarryJohnston: It's not me to decide what it means, since it's not my wording, but your interpretation is the only one that makes sense (except on datagram sockets maybe). See 6th paragraph under "Description" [here](http://man7.org/linux/man-pages/man2/send.2.html): _"fail because of 'too long to pass atomically through the underlying protocol' "_ is something that is explicitly being mentioned in the official docs (and with that, whatever someone, no matter who, said on some newsgroup is irrelevant). You find the same paragraph in POSIX too (but I'm too lazy to search for the citation now). – Damon Mar 03 '15 at 11:21
  • @Damon: my interpretation of that line is that it applies only to protocols like UDP, where each send() must fit into a single datagram. – Harry Johnston Mar 03 '15 at 18:55
  • @Damon There is no 'message' that is 'too long to be passed atomically *through* the underlying protocol' [my emphasis to point out your misquotation] when that protocol is TCP, because (a) it isn't a messaging protocol and (b) it doesn't have discrete length limitations. – user207421 Mar 04 '15 at 08:45
  • @HarryJohnston: Possibly. It's stunning how deliberately vague and yet liberal the wording is, though. The word "atomically" might for example allow rigorously dropping all UDP datagrams which are clearly allowable by UDP but larger than the link MTU. Since they would need to get fragmented, they won't pass _atomically_, so strictly according to the wording, the network stack could just drop them (yes, UDP and IP are generally unreliable, but IP is a "best effort" protocol and fragmentation is a supported feature -- simply dropping packets _for no good reason_ surely isn't "best effort".). – Damon Mar 04 '15 at 09:40