0

This is about unshortening url shorteners to reveal the end url.

I've been using pycurl, and looking up (for example) ~80 urls takes ~24s in real time.

Using the simple requests code below, looking up the same ~80 urls takes ~55s

(I've done multiple runs on the different sets of urls and requests always takes about double the amount of time to do the lookups)

My pycurl code looks like this:

conn = pycurl.Curl()
conn.setopt(pycurl.CAINFO, certifi.where())
conn.setopt(pycurl.FOLLOWLOCATION, True)
conn.setopt(pycurl.MAXREDIRS, 10)
conn.setopt(pycurl.CUSTOMREQUEST, "HEAD")
conn.setopt(pycurl.TIMEOUT, 3)
conn.setopt(pycurl.NOBODY, True)
conn.setopt(pycurl.SSL_VERIFYHOST, 0)

def pycurl_lookup(url):
    try:
        conn.setopt(pycurl.URL, url)
        conn.perform()
        real_url = conn.getinfo(pycurl.EFFECTIVE_URL)
        print(real_url)
    except pycurl.error as pce:
        print(pce, url, conn.getinfo(pycurl.HTTP_CODE))

   

My requests code is very simple:

def requests_lookup(url):

    session = requests.Session()
    session.max_redirects = 10

    try:
        reply = session.head(url, allow_redirects=True, timeout=3)
        real_url = reply.url
        print(real_url)
    except (requests.Timeout, requests.TooManyRedirects) as rte:
        print(rte)

Is there any way I can speed up the requests version, or is pycurl simply quicker for some reason?

For completeness, I'll note that moving the session setup outside the requests function definition cut the delay significantly (my bad coding as always) but it's still about 40% slower than pycurl

session = requests.Session()
session.max_redirects = 10

def requests_lookup(url):

    session = requests.Session()
    session.max_redirects = 10

    try:
        reply = session.head(url, allow_redirects=True, timeout=3)
        real_url = reply.url
        print(real_url)
    except (requests.Timeout, requests.TooManyRedirects) as rte:
        print(rte)
redacted code
  • 192
  • 10

0 Answers0