4

I am downloading data from a server using urllib2. But I need to determine the IP address of the server to which I am connected.

import urllib2
STD_HEADERS = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,
                    */*;q=0.8',
                'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
                'Accept-Language': 'en-us,en;q=0.5',
                'User-Agent': 'Mozilla/5.0 (X11; U; Linux x86_64;en-US;rv:1.9.2.12)     
                           Gecko/20101028 Firefox/3.6.12'}
request = urllib2.Request(url, None, STD_HEADERS)
data =  urllib2.urlopen(request)

Please don't ask me to find the IP address using the URL as this does not guarantee that the server from which the data is downloaded and the IP address query resolve to the same IP address in case of 'HTTPRedirects' or a loadbalancing server

Dan D.
  • 73,243
  • 15
  • 104
  • 123
Parikshit
  • 543
  • 1
  • 7
  • 16

4 Answers4

7

Here's what works for me on Python 2.7:

>>> from urllib2 import urlopen
>>> from socket import fromfd
>>> from socket import AF_INET
>>> from socket import SOCK_STREAM
>>> r = urlopen('http://stackoverflow.com/')
>>> mysockno = r.fileno()
>>> mysock = fromfd( mysockno, AF_INET, SOCK_STREAM)
>>> (ip, port) = mysock.getpeername()
>>> print "got IP %s port %d" % (ip, port)
got IP 198.252.206.140 port 80
rogger
  • 71
  • 1
  • 1
  • also somewhat obvious, if the request fails to connect, you'll never get past the `urlopen` and won't know what ip was being tried. – ThorSummoner Jul 27 '16 at 21:31
3

I know that this is an old question but I've find that the response object returned by urllib2 contains the ip. This looks a bit like a hack but it works.

import urllib2
STD_HEADERS = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,
                    */*;q=0.8',
                'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
                'Accept-Language': 'en-us,en;q=0.5',
                'User-Agent': 'Mozilla/5.0 (X11; U; Linux x86_64;en-US;rv:1.9.2.12)     
                           Gecko/20101028 Firefox/3.6.12'}
request = urllib2.Request(url, None, STD_HEADERS)
data =  urllib2.urlopen(request)

data.fp._sock.fp._sock.getpeername()
Tickon
  • 1,058
  • 1
  • 16
  • 25
gawry
  • 762
  • 1
  • 9
  • 18
3
import urllib2, socket, urlparse

# set up your request as before, then:
data = urllib2.urlopen(request)
addr = socket.gethostbyname(urlparse.urlparse(data.geturl()).hostname)

data.geturl() returns the URL that was used to actually retrieve the resource, after any redirects. The hostname is then fished out with urlparse and handed off to socket.gethostbyname to get the IP address.

Some hosts may have more than one IP address for a given hostname, so it's still possible that the request was fulfilled by a different server, but this is as close as you're gonna get. A gethostbyname right after the URL request is going to use your DNS cache anyway and unless you're dealing with a time-to-live of, like, 1 second, you're going to be getting the same server you just used.

If this is insufficient, you could spin off a thread and do a lsof while still connected to the remote server. I'm sure you could convince urllib2 to leave the connection open for a while so this would succeed. This seems like rather more work than it's worth, though.

kindall
  • 178,883
  • 35
  • 278
  • 309
  • 1
    The `netloc` part may contain port number, if this is specified, so `socket.gethostbyname(urlparse.urlparse('http://www.google.com:80').netloc)` will fail with message `socket.gaierror: [Errno 11004] getaddrinfo failed`. – augustomen Aug 06 '13 at 15:16
  • Changed to `hostname` – kindall Aug 19 '13 at 18:57
1

Kudos should go to gawry for his answer. However, I didn't want to mutilate his answer with my additions, which seem to be somewhat longer than his full answer. So please see this answer as an addition to his answer.

Caveat emptor

This will only work on Python 2.x with urllib2. The structure of the classes have changed in Python 3.x, so even the casual compatibility trick:

try: import urllib.request as urllib2 except ImportError: import urllib2

won't save you. I guess that's the reason why you shouldn't rely on internals of classes, especially when the attributes start with an underscore and are therefore by convention not part of the public interface, albeit being accessible.

Conclusion: the following trick below doesn't work on Python 3.x.

Extracting IP:port from an HTTPResponse

Here's a condensed version of his answer:

import urllib2
r =  urllib2.urlopen("http://google.com")
peer = r.fp._sock.fp._sock.getpeername()
print("%s connected\n\tIP and port: %s:%d\n\tpeer = %r" % (r.geturl(), peer[0], peer[1], peer))

Output will be something like this (trimmed ei parameter for privacy reasons):

http://www.google.co.jp/?gfe_rd=cr&ei=_... connected
        IP and port: 173.194.120.95:80
        peer = ('173.194.120.95', 80)

Assuming r above is an httplib.HTTPResponse instance we make the following additional assumptions:

  • its attribute fp (r.fp) is an instance of class sock._fileobject, created via sock.makefile() in the ctor of httplib.HTTPResponse
  • attribute _sock (r.fp._sock) is the "socket" instance passed to the class socket._fileobject ctor, it will be of type
  • attribute fp (r.fp._sock.fp) is another socket._filetype which wraps the real socket
  • attribute _sock (r.fp._sock.fp._sock) is the real socket object

Roughly r.fp is a socket._fileobject, while r.fp._sock.fp._sock is the actual socket instance (type _socket.socket) wrapped by a socket._fileobject wrapping another socket._fileobject (two levels deep). This is why we have this somewhat unusual .fp._sock.fp._sock. in the middle.

The variable returned by getpeername() above is a tuple for IPv4. Element 0 is the IP in string form and element 1 is the port to which the connection was made on that IP. Note: The documentation states that this format depends on the actual socket type.

Extracting this information from HTTPError

On another note, since urllib2.HTTPError derives from URLError as well as addinfourl and stores the fp in an attribute of the same name, we can even extract that information from an HTTPError exception (not from URLError, though) by adding another fp to the mix like this:

import urllib2
try:
    r =  urllib2.urlopen("https://stackoverflow.com/doesnotexist/url")
    peer = r.fp._sock.fp._sock.getpeername()
    print("%s connected\n\tIP and port: %s:%d\n\tpeer = %r" % (r.geturl(), peer[0], peer[1], peer))
except urllib2.HTTPError, e:
    if e.fp is not None:
        peer = e.fp.fp._sock.fp._sock.getpeername()
        print("%s: %s\n\tIP and port: %s:%d\n\tpeer = %r" % (str(e), e.geturl(), peer[0], peer[1], peer))
    else:
        print("%s: %s\n\tIP and port: <could not be retrieved>" % (str(e), e.geturl()))

Output will be something like this (unless someone at StackOverflow adds that URL ;)):

HTTP Error 404: Not Found: https://stackoverflow.com/doesnotexist/url
        IP and port: 198.252.206.16:80
        peer = ('198.252.206.16', 80)
Community
  • 1
  • 1
0xC0000022L
  • 20,597
  • 9
  • 86
  • 152