So, I'm writing this simple HTTP client in C and I seem to be stuck on this problem - how do I strip the HTTP headers from the response? After all, if I get a binary file I can't just write the headers out to my output file. I can't seem to go in once the data is already written to a file because linux screams when you try to even view the first few lines of a binary file, even if you know they're just text HTTP headers.
Now, here's the rub (well, I suppose the whole thing is a rub). Sometimes the whole header doesn't even in come in on the first response packet, so I can't even guarantee that we'll have the whole header in our first iteration (that is, iteration of receiving an HTTP response. We're using recv()
, here), which means I need to somehow... well, I don't even know. I can't seem to mess with the data once it's already written to disk, so I need to deal with it as it's coming in, but we can't be sure how it's going to come in, and even if we were sure, strtok()
is a nightmare to use.
I guess I'm just hoping someone out there has a better idea. Here's the relevant code. This is really stripped down, I'm going for MCVE, of course. Also, you can just assume that socket_file_descriptor
is already instantiated and get_request
contains the text of our GET request. Here is it:
FILE* fp = fopen("output", "wb"); // Open the file for writing
char buf[MAXDATASIZE]; // The buffer
size_t numbytes; // For the size of the response
/*
* Do all the socket programming stuff to get the socket file descriptor that we need
* ...
* ...
*/
send(socket_file_descriptor, get_request, strlen(get_request), 0); // Send the HTTP GET request
while ((numbytes = recv(socket_file_descriptor, buf, MAXDATASIZE - 1, 0)) > 0) {
/* I either need to do something here, to deal with getting rid of the headers before writing to file */
fwrite(buf, 1, numbytes, fp); // Write to file
memset(buf, 0, MAXDATASIZE); // This just resets the buffer to make room for the next packet
}
close(s);
fclose(fp);
/* Or I need to do something here, to strip the file of its headers after it's been written to disk */
So, I thought about doing something like this. The only thing we know for sure is that the header is going to end in \r\n\r\n
(two carriage returns). So we can use that. This doesn't really work, but hopefully you can figure out where I'm trying to go with it (comments from above removed):
FILE* fp = fopen("output", "wb");
char buf[MAXDATASIZE];
size_t numbytes;
int header_found = 0; // Add a flag, here
/* ...
* ...
*/
send(socket_file_descriptor, get_request, strlen(get_request), 0);
while ((numbytes = recv(socket_file_descriptor, buf, MAXDATASIZE - 1, 0)) > 0) {
if (header_found == 1) { // So this won't happen our first pass through
fwrite(buf, 1, numbytes, fp);
memset(buf, 0, MAXDATASIZE);
}
else { // This will happen our first pass through, maybe our second or third, the header doesn't always come in in full on the first packet
/* And this is where I'm stuck.
* I'm thinking about using strtok() to parse through the lines, but....
* well I just can't figure it out. I'm hoping someone can at least point
* me in the right direction.
*
* The point here would be to somehow determine when we've seen two carriage returns
* in a row and then mark header_found as 1. But even if we DID manage to find the
* two carriage returns, we still need to write the remaining data from this packet to
* the file before moving on to the next iteration, but WITHOUT including the
* header information.
*/
}
}
close(s);
fclose(fp);
I've been staring at this code for three days straight and am slowly losing my mind, so I really appreciate any insight anyone is able to provide. To generalize the problem, I guess this really comes down to me just not understanding how to do text parsing in C.