I am taking a self-learning class, and I am using Python 3.8 to reproduce an in-class exercise regarding connecting to websites and extracting text. The code I am running looks like this:
import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect( ('data.pr4e.org', 80) )
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)
while True:
data = mysock.recv(512)
if (len(data) < 1):
break
print(data.decode())
mysock.close()
All it is supposed to do is retrieve a text document containing a Shakespeare quote and print the text. It is successful, but I get an unexpected newline near the end. My output looks like this:
HTTP/1.1 200 OK
Date: Sat, 09 May 2020 23:40:39 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Sat, 13 May 2017 11:22:22 GMT
ETag: "a7-54f6609245537"
Accept-Ranges: bytes
Content-Length: 167
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: text/plain
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already s
ick and pale with grief
Notice those last two lines. In the worked example, my instructor does not get that newline in the middle of the word 'sick', and I can confirm that the source (which is his own website) is unchanged from when he recorded the example. I tried using rstrip
to no avail. Thoughts?