Image file is corrupted when downloaded using sockets

Question

I am trying to download an image file (format is .bmp) from a website using sockets and openssl library. I initiate an ssl connection and then send the http GET request using sockets and then put whatever the server sends into a file called response.txt. Usually there are some headers in the file about the response about whether the exchange was successful or not and then the image data should begin. Since the image is in .bmp format it should begin with a 'BM' tag and that's how we know where the headers start and that should be the image data for the rest of the file. So I expect that upon deleting everything before the 'BM' tag and copying the rest of the file into a .bmp file I should be able to open the image. However apparently the image is data is corrupted and I can't figure out why the corruption happens. I have tried examining the file in binary mode and comparing it to the actual image I found on the web but I still can't figure what causes the two images to differ after a certain point in their data. Here is the code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netdb.h>
#include <openssl/ssl.h>
#include <openssl/bio.h>
#include <openssl/err.h>

#define h_addr h_addr_list[0]
//#define SERVER "static.vecteezy.com"
#define SERVER "filesamples.com"
#define PORT 443
//#define PATH "/system/resources/previews/002/410/747/original/cute-siamese-cat-on-yellow-background-free-photo.jpg"
#define PATH "/samples/image/bmp/sample_640%C3%97426.bmp"


int main() {
    int sockfd;
    struct sockaddr_in serv_addr;
    struct hostent *server;

    SSL_library_init();
    SSL_CTX *ctx = SSL_CTX_new(TLS_client_method());

    if (ctx == NULL) {
        printf("Error creating SSL context\n");
        return 1;
    }

    SSL *ssl = SSL_new(ctx);

    sockfd = socket(AF_INET, SOCK_STREAM, 0);

    if (sockfd < 0) {
        printf("Error opening socket\n");
        return 1;
    }

    server = gethostbyname(SERVER);

    if (server == NULL) {
        printf("Error resolving server hostname\n");
        return 1;
    }

    memset(&serv_addr, 0, sizeof(serv_addr));
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_port = htons(PORT);
    memcpy(&serv_addr.sin_addr.s_addr, server->h_addr, server->h_length);

    if (connect(sockfd, (struct sockaddr*)&serv_addr, sizeof(serv_addr)) < 0) {
        printf("Error connecting to server\n");
        return 1;
    }

    BIO *sbio = BIO_new_socket(sockfd, BIO_NOCLOSE);
    SSL_set_bio(ssl, sbio, sbio);
    if (SSL_set_tlsext_host_name(ssl, SERVER) != 1) {
        printf("Error setting SNI\n");
        return 1;
    }
    if (SSL_connect(ssl) <= 0) {
        printf("Error establishing SSL connection\n");
        return 1;
    }
    char request[1024];
    sprintf(request, "GET %s HTTP/1.1\r\nHost: %s\r\n\r\n", PATH, SERVER);
    SSL_write(ssl, request, strlen(request));

    char response[1024];
    int bytes_read;
    int total_bytes_read = 0;
    FILE *fp = fopen("response.txt", "wb");

    if (fp == NULL) {
        printf("Error opening file\n");
        return 1;
    }

    do {
        bytes_read = SSL_read(ssl, response, sizeof(response));

        if (bytes_read > 0) {
            printf("bytes read: %i\n",bytes_read);
            fwrite(response, 1, bytes_read, fp);
            total_bytes_read += bytes_read;
        }
    } while (bytes_read > 0);

    fclose(fp);
    printf("Total bytes read: %d\n", total_bytes_read);

    SSL_shutdown(ssl);
    SSL_free(ssl);
    SSL_CTX_free(ctx);
    close(sockfd);

    return 0;
}

Note : the image file is on https://filesamples.com/samples/image/bmp/sample_640%C3%97426.bmp

"and that's how we know where the headers start" Does that mean, you store all the response in the file and do not parse any HTTP response headers? — Gerhardh, Jun 01 '23 at 13:07
If you want to receive responses from a HTTP server, you need to be able to handle HTTP, including all the received headers properly. Unless this is for a school (or similar) assignment, please use a library such as libcurl to handle simple HTTP file fetching. It will handle all the gory stuff for you. — Some programmer dude, Jun 01 '23 at 13:07
Yes all the response from the server is stored in a single response.txt file. — Ilya, Jun 01 '23 at 13:21
If you don't want to bother with all the headers, there's at least *one* thing you can do: Read the network response until you get an empty line `"\r\n"`. That marks the end of the headers and the start of the data. — Some programmer dude, Jun 01 '23 at 13:29
@Ilya _"So I expect that upon deleting everything before the 'BM' tag and copying the rest of the file into a .bmp file"_: how exactly did you "delete everything before the 'BM' tag"? The corruption might take place during that operation. — Jabberwocky, Jun 01 '23 at 13:39
@Ilya this might be useful: https://stackoverflow.com/a/5757349/898348 — Jabberwocky, Jun 01 '23 at 13:41
@Someprogrammerdude *Read the network response until you get an empty line "\r\n"* Which means you need to find the byte sequence `\r\n\r\n` as a blank line is one between **two** consecutive EOL sequences. If you search for the first `\r\n`, you'll just consider the first line of the HTTP header as the header and the rest as the content. — Andrew Henle, Jun 01 '23 at 13:57
The server is sending the response using "chunked" transfer encoding as indicated by the `Transfer-Encoding: chunked` header in the response. Your code would need to deal with that encoding. — Ian Abbott, Jun 01 '23 at 15:59
One way to not have to deal with chunked encoding is to use a HTTP/1.0 request. — Ian Abbott, Jun 01 '23 at 16:04
@Jabberwocky I wrote a program that opens the response.txt file in "rb" mode and then starts writing the image data once it reaches the 'BM' tag into a .bmp file. It successfully separates the http header from the images header. However when I open the images, the images has distortions and miscolored pixels in it. So I guess it is safe to say the corruption takes place when I'm downloading the file. Though I appreciate you pointing out that deleting the header the wrong way may be causing the corruption. — Ilya, Jun 02 '23 at 09:35
@IanAbbott What's wrong with chunked encoding? How could it cause the image data to not download properly? And how would using http 1.0 fix the problem? — Ilya, Jun 02 '23 at 09:37
@Ilya There is nothing wrong with chunked encoding, but the "corruption" you see in response.txt is part of the chunked encoding, not part of the BMP file. — Ian Abbott, Jun 02 '23 at 09:51
@Ilya Try this: instead of downloading a .bmp, download a binary file that contains easily indentifiable patterns such as e.g. bytes 0 to 255 in a row. Then manually check the downloaded file an try to identify the differences between the original file and the downloaded file. This might give some hints, such as maybe CR/LF problems. — Jabberwocky, Jun 02 '23 at 10:01

Image file is corrupted when downloaded using sockets

0 Answers0