0

I have some problems when trying to receive http response message of a website.
This is my function:

void Reveive_response(char *resp, SSL *ssl) {

    const int BUFFER_SIZE = 1024;
    char response[1048576];
    char *buffer = NULL;            // to read from ssl
    char *check = (char *) malloc(BUFFER_SIZE*sizeof(char));
    int bytes;                      // number of bytes actually read
    int received = 0;               // number of bytes received

    buffer = (char *) malloc(BUFFER_SIZE*sizeof(char));     // malloc
    memset(response, '\0', sizeof(response));               // response
    assign = '\0'
    do{
        memset(buffer, '\0', BUFFER_SIZE);          // empty buffer
        bytes = SSL_read(ssl, buffer, BUFFER_SIZE);
        if (bytes < 0) {
            printf("Error: Receive response\n");
            exit(0);
        }
        if (bytes == 0) break;
        received += bytes;
        printf("Received...%d bytes\n", received);
        strncat(response, buffer, bytes);   // concat buffer to response
    } while (SSL_pending(ssl));             // while pending
    response[received] = '\0';
    printf("Receive DONE\n");
    printf("Response: \n%s\n", response);
    free(buffer);
    strcpy(resp, response);                 // return via resp

}

When I call the function, it seems like the response message is not complete. Like this:

Received...1014 bytes
Received...1071 bytes
Receive DONE
Response: 
HTTP/1.1 200 OK
<... something else....>
Vary: Accept-Encoding
Content-Type: text/html
Conne

Then if i call the function again, it returns:

Received...39 bytes
Receive DONE
Response:
ction: keep-alive
Content-Length: 0

The field Connection was split. Why my function didn't receive all the response message? I used do while loop inside. Please tell me where did i go wrong? Thank you.

thanhdx
  • 608
  • 4
  • 16

1 Answers1

2

There is nothing wrong. This is simply how TCP works. It is a streaming transport, it has no concept of message boundaries. There is no 1-to-1 relationship between the number of bytes sent and the number of bytes read. Your reading receives arbitrary bytes, which you are then responsible for processing as needed. Keep reading, buffering and parsing the HTTP data as you go, until you discover the end of the response (see RFC 2616 Section 4.4 Message Length for details). Looping on SSL_pending() is not sufficient (or correct).

In this case, you have to read CRLF-delimited lines one at a time until you reach a CRLF/CRLF pair indicating the end of the response headers, then you need to analyze the headers you have received to know whether a response body is present and how to read it, as it may be in one of several different encoded formats. If present, you can then read the body (decoding it as you go along) until you reach the end of the body as specified by the headers.

See the pseudo-code I posted in my answer to the following question:

Receiving Chunked HTTP Data With Winsock

That said, you really should not be implementing HTTP (let alone HTTPS) manually to begin with. HTTP is not trivial to implement from scratch, and neither is SSL/TLS for that matter. You have dived head-first into a deep well without understand some important basics of network programming and OpenSSL programming. You should use an existing HTTP/S library instead, such as libcurl, and let it handle the details for you so you can focus on your code's business logic and not its communications logic.

Community
  • 1
  • 1
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Thanks. I'll try and reply you soon. – thanhdx Jul 26 '16 at 23:09
  • Hi, after reading more infomation, I still have some question: – thanhdx Jul 27 '16 at 08:11
  • - How do I read a line of text until CRLF? SSL_read() reads to buffer by bytes. Should I read bytes to buffer then parse it to handle each line? Or read bytes one by one until CRLF(it maybe run so slow)? Or anything else? – thanhdx Jul 27 '16 at 08:39
  • - Does the chunked data causes the body of response message what I read in the normal way by my function be wrong? Such as have non-alpha numberic ASCII character ( while I've got `Content-Type: text/html` in the header ) ? I actually got this. So confused. – thanhdx Jul 27 '16 at 08:40
  • - As the example in my question above, it has field `Content-Length: 0` and still be chunked? May it happen? I haven't check this issue yet. It will be in next edited :). – thanhdx Jul 27 '16 at 08:40
  • - By the way, I have to use OpenSSL and be asked to not use any library else. So, thanks for advice :) – thanhdx Jul 27 '16 at 08:40
  • @ĐặngXuânThành use whatever logic makes sense for your app. Use a buffer you parse, or read bytes by byte, it is up to you. The socket does not care. The HTTP body transfers bytes, not text. The bytes may represent text, depending on the `Content-Type`. Read the RFC I linked to, it explains how HTTP works. – Remy Lebeau Jul 27 '16 at 15:30
  • So, as I say, I received a reponse message, which have `Content-Type: text/html` but the body contain non-alpha numeric character, unreadable. I think it may be caused of the field `Transfer-Encoding: chunked` appear in the header of message, mayn't it? – thanhdx Jul 27 '16 at 18:10
  • That is one of several possibilities. I can't say for sure since you haven't shown the actual response. But if you read the pseudo-code I linked to, I take `chunked` into account during reading. – Remy Lebeau Jul 27 '16 at 18:25