Receiving only 'content' part of an HTTP response and writing it into a file

Question

I have code that connects to a server and sends HTTP GET requests for a specific file in that server. I can successfully connect to the server and send GET requests for files, however, when I try to write the content of the received file (they are all .txt files) to a .txt file I create, I write the result of the request such as:

HTTP/1.1 200 OK
Date: Wed, 10 Nov 2021 10:17:26 GMT
Server: Apache/2.4.37 (FreeBSD) OpenSSL/1.0.2o-freebsd
Last-Modified: Sun, 01 Aug 1999 17:20:53 GMT

I want to write only the content of the .txt file received into the file I create.

I have the following code I tried:

clientSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
clientSocket.connect((host_name, server_port))  # This line initiates the connection
request = ("GET %s HTTP/1.1\r\nHost: %s\r\n\r\n" % (file_name, host_name)).encode()
clientSocket.send(request)

rcvpkt = clientSocket.recv(4096)
received_msg = rcvpkt.decode()

if (received_msg.startswith("HTTP/1.1 404 Not Found")):
    print(index + 1, ".", url, "is not found")
else:
    received_txt_file = open(only_file_name, 'w')
    received_txt_file.write(received_msg)

For instance: when host_name = www.textfiles.com/100/balls.txt, file_name = /100/balls.txt, the resulting .txt file is as follows:

HTTP/1.1 200 OK
Date: Wed, 10 Nov 2021 10:17:26 GMT
Server: Apache/2.4.37 (FreeBSD) OpenSSL/1.0.2o-freebsd
Last-Modified: Sun, 01 Aug 1999 17:20:53 GMT
ETag: "b1d-35109effca740"
Accept-Ranges: bytes
Content-Length: 2845
Content-Type: text/plain

1990 July 12 at 11:17 EDT
To:     David Walker
FROM:   Jeff Sharpe
...

Additionally, as you can see, I am specifying the size of the received message as 4096 bytes, however, there can be cases where the file received can have 100,000 bytes as well. What can I do to make it efficient? If I write a relatively large number, such as 500,000 bytes, wouldn't it create a problem for small files that have 2000 bytes?

Please use an existing HTTP library like requests. If you really want to do this by yourself please study the actual HTTP standard, where it is clearly defined how the message body will be sent. In your specific case it might be necessary to read after the header based on the Content-Length, read separate chunks which have a length prefix when Transfer-Encoding chunked is given or read until eof. I consider the question as it currently is too broad. — Steffen Ullrich, Nov 10 '21 at 11:35
You need to first receive just the HTTP response headers and parse them (don't write them to your file) to know where the remaining body content begins (varies based on header length), in what format it is being transmitted in (so you know how to receive it), and how to determine when it ends. I've posted earlier replies on this exact topic, such as [this answer](https://stackoverflow.com/a/19211701/65863), [this answer](https://stackoverflow.com/a/16247097/65863). [this answer](https://stackoverflow.com/a/14421507/65863), and [this answer](https://stackoverflow.com/a/7234357/65863). — Remy Lebeau, Nov 11 '21 at 01:11

Receiving only 'content' part of an HTTP response and writing it into a file

0 Answers0