How to receive large HTML data using SSL_read

Question

while(byte_count != 0){
            byte_count = SSL_read(conn,get_buffer,sizeof(get_buffer));
            printf("%s",get_buffer);
            write_to_file(get_buffer,html,byte_count);   // func to write to file
}

I've been trying to write a http/https client using sockets and SSL in C. The task is to get the HTML file of the landing page of a given website into a file on my system. I've handled the HTTP redirections and I was able to read only a portion of the HTTP payload since I've only called recv/SSL_read once. When I put this in a while loop it reads a few more 16kb segments and the connection times out. Is there any other way I can obtain whole of the HTML file ? (Sorry if this question seems vague, I'll be glad to make edits according to your responses)

I don't think that this code snipped is sufficient to reproduce your problem. But based on your description I assume that you don't actually parse the HTTP protocol and only try to read as much data as specified for the HTTP response body. Instead you might just hope that it will somehow signal the end of body through a `byte_count == 0`. But this is only the case if the server actually closes the connection after the response. With HTTP keep alive though the server might keep the connection open for a while in the hope that the client will send another request. — Steffen Ullrich, Feb 17 '21 at 15:31
@SteffenUllrich yes I used HTTP keep alive in my program and am hoping to receive a `byte_count == 0` to signal the end of body.I've tried using `SSL_pending()` but it didn't help. Maybe I used it wrong. Can you help me with the correct way to retrieve the whole HTML using SSL ? — Silent Guardian, Feb 17 '21 at 15:45
You have to actually understand the HTTP protocol. The information how long the response body will be or how the length can be obtained is part of the HTTP response header, i.e. `Content-length: ...` or `Transfer-Encoding: chunked`. There is actually a standard for HTTP which has all the necessary details, see https://tools.ietf.org/html/rfc7230#section-3.3. Of course, you might try to simplify everything by using HTTP/1.0 instead of HTTP/1.1 (which means no chunked transfer encoding) and by not using HTTP keep alive. — Steffen Ullrich, Feb 17 '21 at 16:36
There are numerous posts on StackOverflow showing the proper way to parse HTTP responses, regardless of whether you are using SSL/TLS or not. For example: https://stackoverflow.com/a/16247097/65863 — Remy Lebeau, Feb 17 '21 at 19:21

How to receive large HTML data using SSL_read

0 Answers0