-1

I have already checked answers regarding my problem, but I couldn't find what's wrong. I am new to Python and that might be a problem. I have written this simple code to connect to a site, but I get this error:

socket.gaierror: [Errno 11004] getaddrinfo failed

This is my code:

import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('http://www.py4e.com', 80))
mysock.send('GET http://www.py4e.com/code3/mbox-short.txt HTTP/1.0\n\n')
while True:
    data = mysock.recv(512)        
    if(len(data) < 1):
        break
    print (data)
mysock.close()
kfb
  • 6,252
  • 6
  • 40
  • 51
A. Drosi
  • 11
  • 3
  • 6
  • Possible duplicate of [What does this socket.gaierror mean?](http://stackoverflow.com/questions/15246088/what-does-this-socket-gaierror-mean) – Michael Foukarakis Nov 03 '16 at 10:48
  • Hostnames do *not* include the scheme (`xxx://`). – Michael Foukarakis Nov 03 '16 at 10:49
  • @MichaelFoukarakis The duplicate is for binding a socket which in term works completely different during a hostname lookup. But it's worth noting tho. – Torxed Nov 03 '16 at 10:49
  • There are dozens of duplicate questions with the same error too, e.g. [this](http://stackoverflow.com/questions/37469680/gaierror-errno-11004-getaddrinfo-failed). Even if the exact MVCE isn't the same, looking at any one of them for this error code or description would resolve this problem. – Michael Foukarakis Nov 03 '16 at 10:56
  • Thank you for your precious suggestions guys! – A. Drosi Nov 03 '16 at 13:12

1 Answers1

1
import socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('www.py4e.com', 80))
mysock.send('GET http://www.py4e.com/code3/mbox-short.txt HTTP/1.0\n\n')
while True:
    data = mysock.recv(512)        
    if(len(data) < 1):
        break
    print (data)
mysock.close()

Quite simple, don't use http:// in your host declaration on .connect().
http:// is a protocol and www.py4e.com is a host (or A record in a DNS server). The standard socket library doesn't know anything regarding protocols and there for requires only a host and a port number.
If you want automated processes check out urllib.request or @Mego's answer using Requests which handles the connection and HTTP parsing for you.

Also if you're using Python3 which you probably should, you need to send bytes data when doing .send().

There's two ways of converting your string to bytes data:

mysock.send(b'GET http://www.py4e.com/code3/mbox-short.txt HTTP/1.0\n\n')
mysock.send(bytes('GET http://www.py4e.com/code3/mbox-short.txt HTTP/1.0\n\n', 'UTF-8'))

Both does the same thing basically.

Finally, in a GET request you don't request http:// either.
Instead you just send the path to the file you want to retrieve:

mysock.send(b'GET /code3/mbox-short.txt HTTP/1.0\n\n')

The reason is (again) that http:// is a protocol descriptor and not part of the actual protocol data being sent. You also don't need the host declaration in your GET request because the server that you connected to already knows which host you're on - since you're... connected to it.
Instead the server expects you to supply a Host: <hostname>\r\n header if the host is serving multiple virtual hosts.
You might need a few other headers tho to be able to request actual content from certain web-servers.

But this is the basic jist of things.

Continue reading

Here's a good start:

It shows you what a raw GET request looks like.
An in the future I recommend using your browsers built-in Network Debugger which can show raw headers, raw responses and a whole bunch of other things.

Torxed
  • 22,866
  • 14
  • 82
  • 131
  • Thank you very much for the help! I will check the link you sent as well. – A. Drosi Nov 03 '16 at 13:14
  • @A.Drosi You're welcome, and welcome to SO! If you feel like mine or any other answer solved your question, feel free to mark any of the answers you deem fit as a accepted solution. Best of luck. – Torxed Nov 03 '16 at 13:46