Unexpected newline in my retrieved web data

Question

I am taking a self-learning class, and I am using Python 3.8 to reproduce an in-class exercise regarding connecting to websites and extracting text. The code I am running looks like this:

import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect( ('data.pr4e.org', 80) )
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)

while True:
    data = mysock.recv(512)
    if (len(data) < 1):
        break
    print(data.decode())
mysock.close()

All it is supposed to do is retrieve a text document containing a Shakespeare quote and print the text. It is successful, but I get an unexpected newline near the end. My output looks like this:

HTTP/1.1 200 OK
Date: Sat, 09 May 2020 23:40:39 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Sat, 13 May 2017 11:22:22 GMT
ETag: "a7-54f6609245537"
Accept-Ranges: bytes
Content-Length: 167
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: text/plain

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already s
ick and pale with grief

Notice those last two lines. In the worked example, my instructor does not get that newline in the middle of the word 'sick', and I can confirm that the source (which is his own website) is unchanged from when he recorded the example. I tried using rstrip to no avail. Thoughts?

Your `print` is adding that newline after each piece of decoded data. Try `print(data.decode(),end='')` — Nick, May 09 '20 at 23:50
@Nick forgive my ignorance, but does that have anything to do with anything like the robustness of the connection, or the timing of the information being sent? Idle curiosity is all. — The Count, May 09 '20 at 23:53
No, it's just that you filled a buffer and so printed its decoded output, and `print` defaults to adding a newline to the end of output unless you tell it not to using `end` — Nick, May 09 '20 at 23:58
Does this answer your question? [How to print without newline or space?](https://stackoverflow.com/questions/493386/how-to-print-without-newline-or-space) — Nick, May 10 '20 at 00:01
@Nick Not really, but the posted answer and your comment did. — The Count, May 10 '20 at 00:01

Jackson · Accepted Answer · 2020-05-10T00:06:07.253

1

Your buffer 512 was reached and so print came to an end and reiterated causing the newline

Change Your code

data = mysock.recv(512)

to

data = mysock.recv(1024)

Your overall code will now be

import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect( ('data.pr4e.org', 80) )
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)

while True:
    data = mysock.recv(1024)
    if (len(data) < 1):
        break
    print(data.decode())
mysock.close()

Nick brought up a rather good point on not hardcoding the buffer size because it could be variable length. If you would like to implement that you can do something like

import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect( ('data.pr4e.org', 80) )
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)

collectedString = ""

while True:
    data = mysock.recv(512)
    if (len(data) < 1):
        break
    collectedString += data.decode()

mysock.close()
print(collectedString)

edited May 10 '20 at 00:06

answered May 09 '20 at 23:56

Jackson

1,213
1
4
14

How very odd that the instructor, who specified the 512, wouldn't mention or acknowledge that in the demo. In any case, thank you. This also works. – The Count May 09 '20 at 23:58
1

And what if the received string is 2000 words long? – Nick May 09 '20 at 23:58
@TheCount just changing the buffer size will only work as long as the string fits in it. The correct solution is to prevent the output of the newline as I described in my comment. – Nick May 09 '20 at 23:59
@Nick I like both answers, because yours solves the issue indefinitely and this one tells me what actually happened in this particular case. I am very new, so I appreciate both responses. Please feel free to post yours so that I can upvote it as well. – The Count May 10 '20 at 00:01
@Nick Great catch. I tested on the URL and saw it worked so i put it up without thinking too far. I have modified the answer to put in your idea – Jackson May 10 '20 at 00:08

Unexpected newline in my retrieved web data

1 Answers1