1

I make a GET request with C in the following code:

   char buffer[1024] =
        "GET / HTTP/1.1\r\n"
        "Host: example.com\r\n"
        "Accept-Encoding: gzip, deflate\r\n"
        "Accept-Language: en-US,en;q=0.5\r\n"
        "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0\r\n"
        "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n"
        "Connection: keep-alive\r\n"
        "Cache-Control: max-age=0\r\n\r\n";

   size_t buffer_len = sizeof(buffer) - 1;

   /* Send message to the server */
   n = write(sockfd, buffer, buffer_len);

   /* Now read server response */
   bzero(buffer, strlen(buffer));
   n = read(sockfd, buffer, buffer_len);

   /* Display result */
   printf("%s\n",buffer);
   return 0;

Properly respond:

HTTP/1.1 200 OK
Date: Mon, 19 Sep 2016 17:20:48 GMT
Server: Apache
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 6695
Keep-Alive: timeout=2, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=UTF-8

�

Except for the last line which should be Message Body, instead of html content It appears only a symbol �, Does anyone know where can be the problem?

Isabel Cariod
  • 353
  • 3
  • 6
  • 20
  • 1
    How big is `buffer`? Looks like you need to read at least 7000 characters from the server. You may need to call `read` a few times. – Mark Plotnick Sep 19 '16 at 18:01
  • I also recommend taking a look in Wireshark so you can see exactly what is being sent,,, both using your code and a browser or netcat (a known working client) to see if there are any discrepancies. – yano Sep 19 '16 at 18:16
  • 2
    Note `Content-Encoding: gzip` in the response. – Andrew Henle Sep 19 '16 at 18:33

1 Answers1

5

You told the server that you are willing to accept a compressed response (see Accept-Encoding: gzip, deflate), so the server actually sent you a gzip-compressed response (see Content-Encoding: gzip). Your code does not actually support decompression (you can use the zlib library for that), so you need to remove Accept-Encoding from your request header. Then you will get a response with a message body that is not compressed.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
pineappleman
  • 849
  • 4
  • 8
  • 20
  • Now displays plain text, and indeed it is not the whole page – Isabel Cariod Sep 19 '16 at 18:41
  • 1
    @IsabelCariod: Your code is not attempting to process the response headers **at all**. You need to process the headers, they tell you how the response is encoded, and thus how it must be read. Pay attention to the `Content-Length` and `Transfer-Encoding` headers to know *how many* bytes to expect, and *how* to read the bytes. [Read RFC 2616 Section 4.4 Message Length](https://tools.ietf.org/html/rfc2616#section-4.4) for more details, and see this [pseudo-code](http://stackoverflow.com/a/7234357/65863) for the kind of reading logic you need to implement in your code. – Remy Lebeau Sep 19 '16 at 18:48