Asynchronous URL un-shortening in Python

Question

I am currently trying to implement a feature into my program that will detect and unshorten any URL shorteners, including bit.ly and old goo.gl links (now no longer existent). I have found a few articles, and I am going to discuss my current experiments, findings and ask the question of "Is there even a way to do it?"

I started off by reading up on any previously found articles. I found a Stack Overflow question on how to un-shorten URLs using Python. The answer pointed to the requests library, using requests.head, setting allow_redirects to True. requests does not function with async.io at all. Which is where I found a question based on Async requests with Python requests (found here)

This question pointed to grequests, which is an async version of requests, however, when I attempted the code from the first question, replacing requests with grequests, it did not show the link location after re-directs. I then changed the .head to .get, and while it did work, it still provided the bit.ly URL I was using, rather than the un-shortened URL.

I am unsure what I could use to find the URL location after unshortening without making it synchronous rather than async. If anyone can help, that would be really useful!

If your goal is to *"not block the main thread"* (as opposed to *"must be done with asyncio"*), you could use multithreading with the `requests` library. — Tomalak, Feb 26 '20 at 18:34
I need to do it with asyncio, due to my other imports. It cannot be done with multithreading (or multiprocessing) — Generic Nerd, Feb 26 '20 at 18:37

score 1 · Accepted Answer · answered Mar 21 '20 at 23:11

1

A good library that I would recommend using is aiohttp, a library which allows for asynchronous web requests.

answered Mar 21 '20 at 23:11

damaredayo

1,048
6
19

score -3 · Answer 2 · answered Sep 10 '21 at 04:47

Try this and then run it as a loop on your data frame using .apply(lambda) :

import requests

def unshortenurlx(url):
    try:
        response = requests.get(url)
        return(response.url)
    except Exception as e:
        return('Bad url {url}. {e}'.format(url=url, e=e))

Asynchronous URL un-shortening in Python

2 Answers2