import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('www.py4e.com', 80))
mysock.send('GET http://www.py4e.com/code3/mbox-short.txt HTTP/1.0\n\n')
while True:
data = mysock.recv(512)
if(len(data) < 1):
break
print (data)
mysock.close()
Quite simple, don't use http://
in your host declaration on .connect()
.
http://
is a protocol and www.py4e.com
is a host (or A record in a DNS server). The standard socket library doesn't know anything regarding protocols and there for requires only a host and a port number.
If you want automated processes check out urllib.request or @Mego's answer using Requests which handles the connection and HTTP parsing for you.
Also if you're using Python3
which you probably should, you need to send bytes
data when doing .send()
.
There's two ways of converting your string to bytes
data:
mysock.send(b'GET http://www.py4e.com/code3/mbox-short.txt HTTP/1.0\n\n')
mysock.send(bytes('GET http://www.py4e.com/code3/mbox-short.txt HTTP/1.0\n\n', 'UTF-8'))
Both does the same thing basically.
Finally, in a GET
request you don't request http://
either.
Instead you just send the path to the file you want to retrieve:
mysock.send(b'GET /code3/mbox-short.txt HTTP/1.0\n\n')
The reason is (again) that http://
is a protocol descriptor and not part of the actual protocol data being sent. You also don't need the host declaration in your GET
request because the server that you connected to already knows which host you're on - since you're... connected to it.
Instead the server expects you to supply a Host: <hostname>\r\n
header if the host is serving multiple virtual hosts.
You might need a few other headers tho to be able to request actual content from certain web-servers.
But this is the basic jist of things.
Continue reading
Here's a good start:
It shows you what a raw GET
request looks like.
An in the future I recommend using your browsers built-in Network Debugger which can show raw headers, raw responses and a whole bunch of other things.