1

I'm having trouble receiving "large" files from a web server using C sockets; namely when these files (or so I suspect) are larger than the size of the buffer I'm using to receive them. If I attempt to ask (through a GET request) for a simple index.html that's not bigger than a few bytes, I get it fine, but anything else fails. I'm assuming that my lack of knowledge on select() or recv() is what's failing me. See here:

fd_set read_fd_set;
FD_ZERO(&read_fd_set);
FD_SET((unsigned int)socketId, &read_fd_set);

/* Initialize the timeout data structure. */
struct timeval timeout;
timeout.tv_sec = 2;
timeout.tv_usec = 0;

// Receives reply from the server
int headerReceived = 0;
do {
    select(socketId+1, &read_fd_set, NULL, NULL, &timeout);

    if (!(FD_ISSET(socketId, &read_fd_set))) {
       break;
    }

    byteSize = recv(socketId, buffer, sizeof buffer, 0);

    if (byteSize == 0 || (byteSize < BUFFER_SIZE && headerReceived)) {
       break;
    }

    headerReceived = 1;

} while(1);

That's right, after sending the GET request to the web server, which I'm pretty sure the server is getting just fine, and GET requests from any other client (like any web browser) are working as intended.

Thanks in advance, any help is greatly appreciated.

Eitan T
  • 32,660
  • 14
  • 72
  • 109
Sergio Morales
  • 2,600
  • 6
  • 32
  • 40
  • I don't think it is robust to assume that either byteSize < BUFFER_SIZE, or a 2 second gap in receipt of data signals the end of data. Note that byteSize could be -1 on error. There will be a `Content-Length: ` field in the header that tells you how many bytes are coming after the header. Perhaps the TCP connection is closed at the end of the transfer (in which case recv will return 0) - I don't know. – William Morris May 21 '12 at 03:07

3 Answers3

2
if (byteSize == 0 || (byteSize < BUFFER_SIZE && headerReceived))
{
    break;
}

headerReceived is set to true after the first read. It is entirely possible and likely subsequent recv()s will be less than BUFFER_SIZE. You are out of the read loop at that point. Recv() is going to return whatever number of bytes there are to read, not necessarily how many you request.

Also either stick with BUFFER_SIZE or sizeof(buffer). Mixing and matching is just asking for a bug somewhere down the road.

Eitan T
  • 32,660
  • 14
  • 72
  • 109
Duck
  • 26,924
  • 5
  • 64
  • 92
  • This seems to have improved the behavior, thanks! A combination of this and EitanT's changed seem to have done the trick :) – Sergio Morales May 21 '12 at 01:34
1

You did not say what O/S you are using, but according to the POSIX spec:

Upon successful completion, the select() function may modify the object pointed to by the timeout argument.

(And I believe Linux, for example, does precisely this.)

So it is very possible that later invocations of your loop have the timeout set to zero, which will cause select to return immediately with no descriptors ready.

I would suggest re-initializing the timeout structure immediately before calling select every time through the loop.

Nemo
  • 70,042
  • 10
  • 116
  • 153
1

One thing that I spot is that you don't reinitialize the selection during the loop. This is probably why you get small files successfully; they are received in one go and the loop doesn't have to be iterated.

I suggest you put the:

FD_ZERO(&read_fd_set);
FD_SET((unsigned int)socketId, &read_fd_set);
timeout.tv_sec = 2;
timeout.tv_usec = 0;

inside the loop (before you invoke select), and it might just work.

Eitan T
  • 32,660
  • 14
  • 72
  • 109