Using Python requests to unshorten urls - slow (compared to pycurl)

Question

This is about unshortening url shorteners to reveal the end url.

I've been using pycurl, and looking up (for example) ~80 urls takes ~24s in real time.

Using the simple requests code below, looking up the same ~80 urls takes ~55s

(I've done multiple runs on the different sets of urls and requests always takes about double the amount of time to do the lookups)

My pycurl code looks like this:

conn = pycurl.Curl()
conn.setopt(pycurl.CAINFO, certifi.where())
conn.setopt(pycurl.FOLLOWLOCATION, True)
conn.setopt(pycurl.MAXREDIRS, 10)
conn.setopt(pycurl.CUSTOMREQUEST, "HEAD")
conn.setopt(pycurl.TIMEOUT, 3)
conn.setopt(pycurl.NOBODY, True)
conn.setopt(pycurl.SSL_VERIFYHOST, 0)

def pycurl_lookup(url):
    try:
        conn.setopt(pycurl.URL, url)
        conn.perform()
        real_url = conn.getinfo(pycurl.EFFECTIVE_URL)
        print(real_url)
    except pycurl.error as pce:
        print(pce, url, conn.getinfo(pycurl.HTTP_CODE))

My requests code is very simple:

def requests_lookup(url):

    session = requests.Session()
    session.max_redirects = 10

    try:
        reply = session.head(url, allow_redirects=True, timeout=3)
        real_url = reply.url
        print(real_url)
    except (requests.Timeout, requests.TooManyRedirects) as rte:
        print(rte)

Is there any way I can speed up the requests version, or is pycurl simply quicker for some reason?

For completeness, I'll note that moving the session setup outside the requests function definition cut the delay significantly (my bad coding as always) but it's still about 40% slower than pycurl

session = requests.Session()
session.max_redirects = 10

def requests_lookup(url):

    session = requests.Session()
    session.max_redirects = 10

    try:
        reply = session.head(url, allow_redirects=True, timeout=3)
        real_url = reply.url
        print(real_url)
    except (requests.Timeout, requests.TooManyRedirects) as rte:
        print(rte)

Sounds to me as if the server you're talking to is slow. 80 HTTP requests should not take 24 seconds, let alone 55. — Tomalak, Jun 09 '21 at 09:56
Other than that, `requests` is fully synchronous, whereas `pycurl` might internally use some async IO to save time. — Tomalak, Jun 09 '21 at 09:58
@maurice-meyer that's a beautifully clear explanation, thank you for the link — redacted code, Jun 09 '21 at 10:16

Using Python requests to unshorten urls - slow (compared to pycurl)

0 Answers0