0
memset(buf, 0, sizeof(buf));
    int htmlstart = 0;
    char * htmlcontent;
    char *mainpage = (char *) malloc(MAXBUF);
    while((tmpres = recv(sock, buf, MAXBUF, 0)) > 0)
    {
        if(htmlstart == 0) //on first run, ignore headers
        {
            htmlcontent = strstr(buf, "\r\n\r\n");
            if(htmlcontent != NULL)
            {
                htmlstart = 1;
                htmlcontent += 4;
            }
        }
        else
        {
            htmlcontent = buf;
        }
        if(htmlstart)
        {
            mainpage = (char *) realloc( mainpage, (strlen(mainpage) + strlen(htmlcontent) + 1) );
            // printf("%s",htmlcontent);
            strcat(mainpage,htmlcontent);
        }
        memset(buf, 0, tmpres);
    }
    if(tmpres < 0)
    {
        perror("Error receiving data");
    }

    printf("%d",(int)strlen(mainpage));

I wrote a simple program to receive an data over HTTP after establishing connection with the server. But I'm having a strange problem if I try to receive a large object like an image. Each time I run the program the last print statement which prints the total size of the HTTP data (without headers), comes out to be different. So what I'm receiving is a corrupt image/part of the image.

Any thoughts on why this might be happening?

EDIT: If I check the cumulative size of htmlcontent before concatenating it with mainpage, even then the size is the same as mainpage after the whole receipt. So the problem can't be in strcat or any other string function.

user1265125
  • 2,608
  • 8
  • 42
  • 65
  • 1
    `buf` is not a string (it has embedded zero bytes); you cannot apply `str*` functions to it. – pmg Mar 21 '14 at 22:33
  • I've declared it as a string actually, since I made this primarily for parsing HTML data, and I just can't figure out why it refuses to work for images. I understand storing image binary data in a char * is weird, but I don't see a reason why it shouldn't receive the same SIZE of data on the exact same query every time! – user1265125 Mar 21 '14 at 22:41
  • @user1265125 Then you don't understand the relationship between TCP and HTTP. You're somehow expecting the TCP `recv` function to perform HTTP protocol logic for you. It can't do that. You have to do that. – David Schwartz Mar 22 '14 at 02:15
  • others are right in that you do not properly implement the HTTP protocol and also that you do not obeye the result is binary when applying strxxx() functions to it. Nevertheless, your question - as I understood it - was why you get different results/lengths each call. This is because you strcat to an uninitialized buffer (mainpage). Since it probably will have NUL bytes at random positions, also strlen() gets random results. – mfro Mar 22 '14 at 07:39

2 Answers2

1

You forgot to implement the HTTP protocol! The HTTP protocol specifies how you know when you have the entire object, for example, using things like a Content-Length header. You have to implement the protocol. The recv function just knows it's reading from a stream of bytes.

Also, your use of strlen and strcat is incorrect. The recv function tells you how many bytes it received. The strlen function is only for strings, not arbitrary chunks of data you haven't parsed or processed yet.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • No I just didn't include the HTTP part. I created and sent the header of course. That is html pages which are smaller in size are no problem. The problem comes when I'm receiving something large like an image. – user1265125 Mar 21 '14 at 22:39
  • you MUST MUST MUST implement the actual HTTP protocol. Your code is NOT anywhere close to doing that! Read [RFC 2616](http://tools.ietf.org/html/rfc2616), which defines the HTTP protocol. In particular, [Section 4.4](http://tools.ietf.org/html/rfc2616#section-4.4) explains how to know **when to read**, **how to read**, and **how much to read** when receiving a response's body data. For example, have a look at the following pseudocode I posted awhile back ago: http://stackoverflow.com/questions/7232931/receiving-chunked-http-data-with-winsock/7234357#7234357 – Remy Lebeau Mar 21 '14 at 22:39
  • @user1265125 Yes, it works by luck some of the time and fails the rest of the time because you haven't actually implemented the HTTP protocol and also because you are treating arbitrary binary data as if it were strings. Your code stops before the HTTP protocol says it should because ... it doesn't actually implement the protocol. – David Schwartz Mar 21 '14 at 22:57
0

if you strcat() something to an undefined buffer (mainpage), strlen() will return arbitrary results.

Miller
  • 34,962
  • 4
  • 39
  • 60
mfro
  • 3,286
  • 1
  • 19
  • 28