How to determine the IP address of the server after connecting with urllib2?

Question

I am downloading data from a server using urllib2. But I need to determine the IP address of the server to which I am connected.

import urllib2
STD_HEADERS = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,
                    */*;q=0.8',
                'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
                'Accept-Language': 'en-us,en;q=0.5',
                'User-Agent': 'Mozilla/5.0 (X11; U; Linux x86_64;en-US;rv:1.9.2.12)     
                           Gecko/20101028 Firefox/3.6.12'}
request = urllib2.Request(url, None, STD_HEADERS)
data =  urllib2.urlopen(request)

Please don't ask me to find the IP address using the URL as this does not guarantee that the server from which the data is downloaded and the IP address query resolve to the same IP address in case of 'HTTPRedirects' or a loadbalancing server

score 7 · Answer 1 · answered Feb 06 '15 at 21:48

7

Here's what works for me on Python 2.7:

>>> from urllib2 import urlopen
>>> from socket import fromfd
>>> from socket import AF_INET
>>> from socket import SOCK_STREAM
>>> r = urlopen('http://stackoverflow.com/')
>>> mysockno = r.fileno()
>>> mysock = fromfd( mysockno, AF_INET, SOCK_STREAM)
>>> (ip, port) = mysock.getpeername()
>>> print "got IP %s port %d" % (ip, port)
got IP 198.252.206.140 port 80

answered Feb 06 '15 at 21:48

rogger

71
1
1

also somewhat obvious, if the request fails to connect, you'll never get past the `urlopen` and won't know what ip was being tried. – ThorSummoner Jul 27 '16 at 21:31

score 3 · Answer 2 · edited Nov 13 '17 at 09:44

I know that this is an old question but I've find that the response object returned by urllib2 contains the ip. This looks a bit like a hack but it works.

import urllib2
STD_HEADERS = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,
                    */*;q=0.8',
                'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
                'Accept-Language': 'en-us,en;q=0.5',
                'User-Agent': 'Mozilla/5.0 (X11; U; Linux x86_64;en-US;rv:1.9.2.12)     
                           Gecko/20101028 Firefox/3.6.12'}
request = urllib2.Request(url, None, STD_HEADERS)
data =  urllib2.urlopen(request)

data.fp._sock.fp._sock.getpeername()

Should be the accepted answer. It's the *exact* functionality the asker was looking for. And so was I. +1 — 0xC0000022L, Nov 05 '14 at 14:37

kindall · Accepted Answer · 2013-08-19T18:57:08.590

import urllib2, socket, urlparse

# set up your request as before, then:
data = urllib2.urlopen(request)
addr = socket.gethostbyname(urlparse.urlparse(data.geturl()).hostname)

data.geturl() returns the URL that was used to actually retrieve the resource, after any redirects. The hostname is then fished out with urlparse and handed off to socket.gethostbyname to get the IP address.

Some hosts may have more than one IP address for a given hostname, so it's still possible that the request was fulfilled by a different server, but this is as close as you're gonna get. A gethostbyname right after the URL request is going to use your DNS cache anyway and unless you're dealing with a time-to-live of, like, 1 second, you're going to be getting the same server you just used.

If this is insufficient, you could spin off a thread and do a lsof while still connected to the remote server. I'm sure you could convince urllib2 to leave the connection open for a while so this would succeed. This seems like rather more work than it's worth, though.

The `netloc` part may contain port number, if this is specified, so `socket.gethostbyname(urlparse.urlparse('http://www.google.com:80').netloc)` will fail with message `socket.gaierror: [Errno 11004] getaddrinfo failed`. — augustomen, Aug 06 '13 at 15:16

score 1 · Answer 4 · edited May 23 '17 at 12:10

Kudos should go to gawry for his answer. However, I didn't want to mutilate his answer with my additions, which seem to be somewhat longer than his full answer. So please see this answer as an addition to his answer.

Caveat emptor

This will only work on Python 2.x with urllib2. The structure of the classes have changed in Python 3.x, so even the casual compatibility trick:

try: import urllib.request as urllib2 except ImportError: import urllib2

won't save you. I guess that's the reason why you shouldn't rely on internals of classes, especially when the attributes start with an underscore and are therefore by convention not part of the public interface, albeit being accessible.

Conclusion: the following trick below doesn't work on Python 3.x.

Extracting IP:port from an `HTTPResponse`

Here's a condensed version of his answer:

import urllib2
r =  urllib2.urlopen("http://google.com")
peer = r.fp._sock.fp._sock.getpeername()
print("%s connected\n\tIP and port: %s:%d\n\tpeer = %r" % (r.geturl(), peer[0], peer[1], peer))

Output will be something like this (trimmed ei parameter for privacy reasons):

http://www.google.co.jp/?gfe_rd=cr&ei=_... connected
        IP and port: 173.194.120.95:80
        peer = ('173.194.120.95', 80)

Assuming r above is an httplib.HTTPResponse instance we make the following additional assumptions:

its attribute fp (r.fp) is an instance of class sock._fileobject, created via sock.makefile() in the ctor of httplib.HTTPResponse
attribute _sock (r.fp._sock) is the "socket" instance passed to the class socket._fileobject ctor, it will be of type
attribute fp (r.fp._sock.fp) is another socket._filetype which wraps the real socket
attribute _sock (r.fp._sock.fp._sock) is the real socket object

Roughly r.fp is a socket._fileobject, while r.fp._sock.fp._sock is the actual socket instance (type _socket.socket) wrapped by a socket._fileobject wrapping another socket._fileobject (two levels deep). This is why we have this somewhat unusual .fp._sock.fp._sock. in the middle.

The variable returned by getpeername() above is a tuple for IPv4. Element 0 is the IP in string form and element 1 is the port to which the connection was made on that IP. Note: The documentation states that this format depends on the actual socket type.

Extracting this information from `HTTPError`

On another note, since urllib2.HTTPError derives from URLError as well as addinfourl and stores the fp in an attribute of the same name, we can even extract that information from an HTTPError exception (not from URLError, though) by adding another fp to the mix like this:

import urllib2
try:
    r =  urllib2.urlopen("https://stackoverflow.com/doesnotexist/url")
    peer = r.fp._sock.fp._sock.getpeername()
    print("%s connected\n\tIP and port: %s:%d\n\tpeer = %r" % (r.geturl(), peer[0], peer[1], peer))
except urllib2.HTTPError, e:
    if e.fp is not None:
        peer = e.fp.fp._sock.fp._sock.getpeername()
        print("%s: %s\n\tIP and port: %s:%d\n\tpeer = %r" % (str(e), e.geturl(), peer[0], peer[1], peer))
    else:
        print("%s: %s\n\tIP and port: <could not be retrieved>" % (str(e), e.geturl()))

Output will be something like this (unless someone at StackOverflow adds that URL ;)):

HTTP Error 404: Not Found: https://stackoverflow.com/doesnotexist/url
        IP and port: 198.252.206.16:80
        peer = ('198.252.206.16', 80)

How to determine the IP address of the server after connecting with urllib2?

4 Answers4

Caveat emptor

Extracting IP:port from an `HTTPResponse`

Extracting this information from `HTTPError`

Linked

Related

How to determine the IP address of the server after connecting with urllib2?

4 Answers4

Caveat emptor

Extracting IP:port from an HTTPResponse

Extracting this information from HTTPError

Linked

Related

Extracting IP:port from an `HTTPResponse`

Extracting this information from `HTTPError`