0

Following is my code:

import socket
import time
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('www.py4inf.com', 80))
mysock.send(b'GET /code/romeo.txt HTTP/1.1\n')
mysock.send(b'Host: www.py4inf.com\n\n')
all = b""

while True:
    data = mysock.recv(512)
    all = all + data
    if len(data) < 1:
        break

mysock.close()

stuff = all.decode()
position = stuff.find('\r\n\r\n')
print(stuff[position+4:])

There must be something wrong because it takes almost 30 seconds to invoke break in while loop. However, if I change the code if len(data) < 1: to if len(data) < 100: it took just 0.5 second.

Please help. It haunted me for a while. The sample website: http://www.py4inf.com/code/romeo.txt

reflective_mind
  • 1,475
  • 3
  • 15
  • 28
JianWei
  • 121
  • 2
  • 6
  • This makes perfect sense - of course it takes longer for some extremely unlikely to occur, versus something much more likely. Instead, ask yourself what you're trying to do with the break. When do you want to stop listening on the socket? – Danielle M. Oct 05 '16 at 15:40
  • 1
    This question has been discussed lots of times here on SO (for example here: http://stackoverflow.com/questions/17667903/python-socket-receive-large-amount-of-data). There is nothing wrong with your code. This is just how sockets work. mysock.recv(512) waits for 512 Bytes. After some time, the connection is simply dropped. Have a look at the Python docs for non-blocking sockets: https://docs.python.org/3/howto/sockets.html#non-blocking-sockets – Sven Rusch Oct 05 '16 at 15:42
  • I get it! I modify the code above to if len(data) < 100, that works because the second receive from mysock.recv(512) is under 100 by accident. That's why it could cut down the running time so much. Also, really appreciate your comment so that I am more familiar with socket a little bit. Hope I could be as good as you guys some day. – JianWei Oct 06 '16 at 01:44

1 Answers1

0

Web servers don't have to close connections immediately.In fact, they may be looking for another http request. Just add print(data) after the recv and you'll see you get the data, then a pause, then b'', meaning the server finally closed the socket.

You'll also notice that the server sends a header that includes "Content-Length: 167\r\n". Once the header has finished, the server will send exactly 167 bytes of data. You could parse out the header yourself but that's why we have client libraries like urllib and requests.

I was curious about how much would need to be added to the request header to get the connection to close immediately, and Connection: close seemed to do it. This returns right away:

import socket
import time
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('www.py4inf.com', 80))
mysock.send(b'GET /code/romeo.txt HTTP/1.1\n')
mysock.send(b'Connection: close\n')
mysock.send(b'Host: www.py4inf.com\n\n')
all = b""

while True:
    data = mysock.recv(512)
    all = all + data
    if len(data) < 1:
        break

mysock.close()

stuff = all.decode()
position = stuff.find('\r\n\r\n')
print(stuff[position+4:])
tdelaney
  • 73,364
  • 6
  • 83
  • 116