2

The problem

I need to check if domain from URL is not pointing to a private IP before request and also return IP that was used for HTTP connection.

This is my test script:

import ipaddress
import requests
import socket
import sys

from urllib.parse import urlparse


def get_ip(url):
    hostname = socket.gethostbyname(urlparse(url).hostname)
    print('IP: {}'.format(hostname))
    if hostname:
        return ipaddress.IPv4Address(hostname).is_private

def get_req(url):
    private_ip = get_ip(url)
    if not private_ip:
        try:
            with requests.Session() as s:
                s.max_redirects = 5
                r = s.get(url, timeout=5, stream=True)
            return {'url': url, 'staus_code': r.status_code}
        except requests.exceptions.RequestException:
            return 'ERROR'
    return 'Private IP'

if __name__ == '__main__':
    print(get_req(sys.argv[1]))

This won't work if domain is resolving to multiply IPs, for instance if website is hosted behind CloudFlare:

# python test.py http://example.com
IP: 104.31.65.106
{'staus_code': 200, 'url': 'http://exmaple.com'}

A snippet from tcpdump:

22:21:51.833221 IP 1.2.3.4.54786 > 104.31.64.106.80: Flags [S], seq 902413592, win 29200, options [mss 1460,sackOK,TS val 252001723 ecr 0,nop,wscale 7], length 0
22:21:51.835313 IP 104.31.64.106.80 > 1.2.3.4.54786: Flags [S.], seq 2314392251, ack 902413593, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 10], length 0
22:21:51.835373 IP 1.2.3.4.54786 > 104.31.64.106.80: Flags [.], ack 1, win 229, length 0

The script tested it on 104.31.65.106 but HTTP connection was made on 104.31.64.106

I saw this thread but I won't be consuming the response body so the connection won't be released and actually my version of requests module doesn't have these attributes.

Is there a way to achive this with requests module or do I have to use another library like urllib or urliib3?

To clarify: I only need to prevent the request if an attempt would be made to connect to a private network address. If there are multiple options and a public address is picked, it's fine.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
HTF
  • 6,632
  • 6
  • 30
  • 49
  • Why exactly doesn't `rsp=requests.get(..., stream=True);rsp.raw._connection.sock.getpeername()` work for you? – Flurin Jun 15 '17 at 22:14
  • OK, so I just tested it and I guess I could close connection in try/except block but it looks like stream works only if server has keep-alive enabled, otherwise connection is closed immediately and I get `AttributeError: 'NoneType' object has no attribute 'getpeername'`. I would like to also check IP before request is made. – HTF Jun 16 '17 at 07:26
  • Why all the shenanigans with `with requests.Session() as s` then `s = requests.Session()`? That just replaced your configured session, drop the `s = ...` line. – Martijn Pieters Jun 16 '17 at 09:14
  • @MartijnPieters I believe that's what left from testing, I just removed it. – HTF Jun 16 '17 at 11:15
  • So if a hostname can resolve to either a public address or a private one, you want to block that request? Or only if the *currently picked IP address* is not a public one? – Martijn Pieters Jun 16 '17 at 11:17
  • Say `foo.com` is registered as being available on `192.168.2.42` and `104.31.64.106`, should the request be blocked even though `requests` was going to use `104.31.64.106`? – Martijn Pieters Jun 16 '17 at 11:19
  • In other words, does the *possibility* of using a non-public address matter here, or only the *actual use* of such an IP address? – Martijn Pieters Jun 16 '17 at 11:19
  • I want to prevent requests to private IPs only so it want hit the web server itselt. I would like to implement this to Django app. – HTF Jun 17 '17 at 11:41
  • @HTF: that doesn't answer my question, not really. What should happen when a hostname resolves to multiple IP addresses. Should it a) prevent the request altogether, b) pick one of the non-private IP addresses, or c) only prevent the request if a private IP address is being picked? The fact that this is for a Django app doesn't actually matter. – Martijn Pieters Jun 17 '17 at 12:11
  • `c) only prevent the request if a private IP address is being picked` – HTF Jun 17 '17 at 13:54
  • @HTF: but what happens then is that a `OSError: [Errno 65] No route to host` exception is raised, and urllib3 just moves on to the next address in the options. So for *multiple addresses*, this is just fine, `urllib3` just will skip over that address and move on to the next. – Martijn Pieters Jun 17 '17 at 15:48
  • I'm still not sure why you need this. Non-routable IP addresses are automatically skipped; each DNS result is tried in turn until a successful connection is made. Private network addresses are thus *skipped without your intervention*. Besides, including a private-network IP address in public DNS information is huge misconfiguration, and doesn't happen that often. Why is this a problem you need to solve? – Martijn Pieters Jun 17 '17 at 16:04
  • This is just a safety check for corner cases when a user try to submit a URL that is pointing to a private IP, loopback or 0.0.0.0 because in that case request will go to a web server hosting this app. – HTF Jun 17 '17 at 21:05

1 Answers1

2

urllib3 will automatically skip unroutable addresses for a given DNS name. This is not something that needs preventing.

What happens internally when creating a connection is this:

  • DNS information is requested; if your system supports IPv6 (binding to ::1 succeeds) then that includes IPv6 addresses.
  • In the order that the addresses are listed, they are tried one by one
    • for each address a suitable socket is configured and
    • The socket is told to connect to the IP address
    • If connecting fails, the next IP address is tried, otherwise the connected socket is returned.

See the urllib3.util.connection.create_connection() function. Private networks are usually not routable and are thus skipped automatically.

However, if you are on a private network yourself, then it is possible that an attempt is made to connect to that IP address anyway, which can take some time to resolve.

The solution is to adapt a previous answer of mine that lets you resolve the hostname at the point where the socket connection is created; this should let you skip private use addresses. Create your own loop over socket.getaddrinfo() and raise an exception at that point if a private network address would be attempted:

import socket
from ipaddress import ip_address
from urllib3.util import connection


class PrivateNetworkException(Exception):
    pass


_orig_create_connection = connection.create_connection

def patched_create_connection(address, *args, **kwargs):
    """Wrap urllib3's create_connection to resolve the name elsewhere"""
    # resolve hostname to an ip address; use your own
    # resolver here, as otherwise the system resolver will be used.
    family = connection.allowed_gai_family()

    host, port = address
    err = None
    for *_, sa in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
        ip, port = sa
        if ip_address(ip).is_private:
            # Private network address, raise an exception to prevent
            # connecting
            raise PrivateNetworkException(ip)
        try:
            # try to create connection for this one address
            return _orig_create_connection((ip, port), *args, **kwargs)
        except socket.error as err:
            last_err = err
            continue

        if last_err is not None:
            raise last_err

connection.create_connection = patched_create_connection

So this code looks up the IP addresses for a host early, then raises a custom exception. Catch that exception:

with requests.Session(max_redirects=5) as s:
    try:
        r = s.get(url, timeout=5, stream=True)
        return {'url': url, 'staus_code': r.status_code}
    except PrivateNetworkException:
        return 'Private IP'
    except requests.exceptions.RequestException:
        return 'ERROR'
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thanks, any suggestion where I could pass IP that the connection was actually made on to `requests.raw._original_response`? – HTF Jun 17 '17 at 21:24
  • 1
    @HTF: I'm going to assume you are using Python 3 and therefor other answers you found on SO that apply to Python 2 no longer work. That's because the socket file is a little more complex now. `requests.raw._original_response` is a `http.client.HTTPResponse` instance, `.fp` is the socketfile, which here consists of a buffer wrapping a `SocketIO` object with the actual socket in the `_sock` attribute. So the original socket is available as `requests.raw._original_response.fp.raw._sock`. Call `.getpeername()` on that. – Martijn Pieters Jun 18 '17 at 08:36