0

Im trying to download a file from my website using winsock. i faced countless problems and now im able to download the file, but its corrupted.

It doesnt work with any file extension. Text and pictures end up corrupted, audio files too. With binary files i can see this error upon execution "program too big to fit in memory".

First i send() a Head request to the server to know the content-leght (size of file to download), then i send a Get request and i recv into a buffer. After recv is done i write the file.

I tried to write a simple example of code here, i tried various loop approaches, but at the end i still have a corrupted file written to disk. the size is the same (50kb file on the server, 50kb file downloaded and written on disk). Thank you all.

headrequest = "HEAD " + "/folder/file.asd" + " HTTP/1.1\r\nHost: " + "url.com" + "\r\n\r\n";
getrequest = "GET " + "/folder/file.asd" + " HTTP/1.1\r\nHost: " + "url.com" + "\r\n\r\n";

send(socket, headrequest, sizeof(headrequest), 0);
recv(socket, reply_buf_headrequest, sizeof(reply_buf_headrequest), 0); 
//two functions to get the header end and "Content-Lenght" data from header

send(socket, getrequest, sizeof(getrequest), 0);
while(1)
{    
 recv(socket, recvbuff, sizeof(recvbuff), 0);
 if (recv(socket, recvbuff, sizeof(recvbuff), 0) == 0) 
  break; 
}
out.write(recvbuff, content_lenght); // also tried --> out.write(recvbuff + header_end, content_lenght) //same errors.
out.close();

I screw up with the buffer/position to start reading/writing or something like that. I thought using recvbuff + header_end would work, since it would start reading from the end of the header to get the file. This is confusing. I hope one kind soul could help me figure out how to handle this situation and write correctly the file bytes. :)

Edit:

i dint thought that i was overwriting data like that. damn. content_length comes from the previous HEAD request, a function reads the recv'ed data and finds the "Content-Length" value, which is the size in bytes of /folder/file.asd. i couldnt manage to get it in the Get request, so i did it like this.. the filesize it gets is correct.

so,

while(1)
{
  if (recv(socket, recvbuff, sizeof(recvbuff), 0) == 0)
   break;
}
out.write(recvbuff, content_lenght);
out.close();

out.write should after the loop or inside the while(1) loop?

Thanks for the fast reply. :)

I omitted the error checking part to keep the example code short, sorry. the head and get request are chars, i tried with strings too and ended up not using sizeof() for that. i cant access the real code until tomorrow, so im trying to fix it at home using a similar snippet..there are some typos probably..

Edit 2: as test with a small exe that just spawns a messagebox im using a buffer bigger than the file and this:

ofstream out("test.exe", ios::binary);

and using this loop now:

    int res;   // return code to monitor transfer
do {    
    res = recv(socket, recvbuff, sizeof(recvbuff), 0);   // look at return code
    if (res > 0)  // if bytes received 
        out.write(recvbuff, res ); // write them  
} while (res>0);   // loop as long as we receive something  
if (res==SOCKET_ERROR)  
    cerr << "Error: " << WSAGetLastError() << endl; 

still having "program too big to fit in memory" error upon execution..

z0x
  • 39
  • 7
  • 1
    What is the datatype of headrequest and getrequest? If they are std::string or a similar string class then sizeof(...) does not return the string length. – ScottMcP-MVP Nov 30 '14 at 14:26
  • 1
    You never check the values returned by send() or recv(), so you don't know how many bytes were actually sent, nor how many bytes were actually placed into recvbuff. It might be fewer bytes than the number you requested, in which case your buffer will not be fully populated. – Jeremy Friesner Nov 30 '14 at 14:32
  • 1
    In addition to the solution posted make sure you're opening the output file in binary mode otherwise new line characters will get translated. – Captain Obvlious Nov 30 '14 at 14:43
  • Your edit will still not work: recv() will always write what it receives at the begin of the buffer. At the end of the loop you have constanly overwritten your buffer, and the buffer contains only the last couple of bytes. Only then you write the buffer, but for the full length of the file (may be going beyond the buffer size) So you'll write a lot of uninitialized data ! – Christophe Nov 30 '14 at 14:43
  • 1
    You do not need to use `HEAD` to get the file size before sending `GET`. HTTP requests/responses are self contained messages. The response to `GET` *will* tell you how to figure out the size of the file while you are downloading it, whether that be via a `Content-Length` header, the `Transfer-Encoding: chunked` header, etc. HTTP is much more complex then you give it credit for (and your socket reading/writing code is just plain wrong in general), so you would be better off using WinInet/WinHTTP or a third-party library like libcurl to handle all of these details for you. – Remy Lebeau Dec 01 '14 at 02:43
  • Thanks Remy, i removed the HEAD request. This is a hard quest for me right now, and i understand that my current recv/write code is wrong..thats more or less the reason why im posting here.. I still cant get how to write the file correctly. last thing i tried was to use out.write and start reading from the recvbuff after the headers, a guy suggested me this. Its still wrong unfortunately. Any suggestion is welcome..But i cant use curl or other libraries this time. Also, if i try to download a 1-line text file it works perfectly. But with a 69kb file it does not. Thank you. – z0x Dec 04 '14 at 21:38

1 Answers1

1

That's normal ! Your code doesn't really take care of the content you receive !

See my comments:

while(1)  // Your original (indented) code commented: 
{    
    recv(socket, recvbuff, sizeof(recvbuff), 0);  // You read data in buffer 
    if (recv(socket, recvbuff, sizeof(recvbuff), 0) == 0)  // you read again, overwriting data you've received !! 
        break; 
}
out.write(recvbuff, content_lenght); // You only write the last thing you've received. 
                            // Where does the lengthe come from ?  Maybe you have buffer overflow as well.

Rewrite your loop as follows:

int res;   // return code to monitor transfer
do {    
    res = recv(socket, recvbuff, sizeof(recvbuff), 0);   // look at return code
    if (res > 0)  // if bytes received 
        out.write(recvbuff, res ); // write them  
} while (res>0);   // loop as long as we receive something  
if (res==SOCKET_ERROR)  
    cerr << "Error: " << WSAGetLastError() << endl; 

The advantage is that you don't have to care for overall size, as you write each small chunk that you receive.

Edit:

Following our exchange of comment, here some additional information. As someone pointed out, HTTP protocol is somewhat more complex to manage. See here, in chapter 6 for additional details about the format of a response, and the header you have to skip.

Here some updated proof of concept to skip the header:

ofstream out;
out.open(filename, ios::binary);
bool header_skipped=false;  // was header skiped (do it only once !!) 
int res;   // return code to monitor transfer
do {
    res = recv(mysocket, recvbuff, sizeof(recvbuff), 0);   // look at return code
    if (res > 0)     // if bytes received
    {
        size_t data_offset = 0;      // normally take data from begin of butter 
        if (!header_skipped) {    // if header was not skipped, look for its end
            char *eoh = "\r\n\r\n";
            auto it = search (recvbuff, recvbuff + res, eoh, eoh + 4); 
            if (it != recvbuff + res) {   // if header end found: 
                data_offset = it - recvbuff + 4;      // skip it
                header_skipped = true;              // and then do not care any longer
            }                             // because data can also containt \r\n\r\n
        }
        out.write(recvbuff + data_offset, res - data_offset); // write, ignoring before the offset
    }
} while (res > 0);   // loop as long as we receive something  
if (res == SOCKET_ERROR) cerr << "Error: " << WSAGetLastError() << endl;
out.close();

Attention ! As said, it's a proof of concept. It will probably work. However, be aware that you cannot be sure how the data will be regrouped at receiver side. It is perfectly well possibly that the end of header is split between two successive reads (e.g. \r as last byte of one recv() and \n\r\n as first bytes of next recv()). In such a case this simple code won't find it. So it's not yet production quality code. Up to you to improve further

Christophe
  • 68,716
  • 7
  • 72
  • 138
  • tried the loop you wrote up here. no errors returned, file is downloaded and written with the exact same size, but on execution it throws "program too big to fit in memory". im trying to download a small exe file with just a MessageBox popping up for testing. – z0x Nov 30 '14 at 14:47
  • Ok ! As Captain Oblivious pointed out, you have to make sure that the file was opened as binary, in order to avoid that some binary chars are converted and create problems. – Christophe Nov 30 '14 at 14:51
  • Im using this to deal with the file ofstream out("test.exe", ios::binary), is wrong? Thanks for the patience. – z0x Nov 30 '14 at 14:53
  • ok, using your loop im getting 2 extra bytes everytime now..and the "too big to fit in memory error".. what could i do? – z0x Dec 04 '14 at 22:33
  • that memory error is with windows xp, with upper versions i get a "unsupported 16bit application" error. tried with 32bits and 64bits windows machines. – z0x Dec 04 '14 at 22:58
  • Just to eliminate somme error possibilities: 1) did you try the windows command line FTP (which uses the same API as above) to do the same thing and be sure that the problem doesn't come from your exe you're transfering ? 2) errors too big... or unsupported... are when you try to execute the transferred ap ? – Christophe Dec 05 '14 at 01:05
  • Yes the file can be downloaded with other tools, and with my application i can download a text file, the text is formatted correctly. However with a jpg, zip, exe or any other data format it wont work. Neither recv or out.write returns an error, recv returns zero after a while and the size in bytes of the downloaded file is correct. its probably corrupting the file on writing, but i dont know how to assure i write only the bytes of the file and not the headers and stuff. out.write(recvbuff + header_end /*or with + 4*/, content_lenght) throws the same error as above. Thank you so much ;) – z0x Dec 05 '14 at 01:18
  • Repeatîg the same (bad) transfer always result in identical files ? Not just size, but byte by byte ? And what about comparing the wrong received file with the original, to find out what's different? At least to ser if it's additionally inserted http or pb with binary conversion... – Christophe Dec 05 '14 at 06:48
  • inspecting the downloaded and original file with a "byte viewer" i see that the downloaded one got four underlines like "_" as first 4 bytes, then all the correct bytes. this using out.write(recvbuff + header_end, res - header_end); in your last loop. if i add a -4 to the second out.write argument the file is corrupted, all the bytes are different/missing, but the first 4 underscores are gone. this is insane, how can i skip the first 4 bytes and write the rest correctly? Thanks. code: http://pastebin.com/UZiVXYhH – z0x Dec 05 '14 at 12:58
  • Ok ! Not what's wrong: in your pastebin you speak of \r\n\r\n, but above you speak of 4 underscoders, and still above 40 extra bytes are mentioned. Looking at [https://www.ietf.org/rfc/rfc2616.txt] chapter 6, I'l typ on \r\n\r\n needed to spearate header an body. However you can see in this RFC that it could be a lot more complex. – Christophe Dec 05 '14 at 17:07
  • the 4 underscores are there when i use "recvbuff + header_end"..it shouldnt do that..im confused sorry. i dont know how to re-write this line: out.write(recvbuff + header_end, res - header_end); – z0x Dec 05 '14 at 17:34