1

I am currently trying to implement a feature into my program that will detect and unshorten any URL shorteners, including bit.ly and old goo.gl links (now no longer existent). I have found a few articles, and I am going to discuss my current experiments, findings and ask the question of "Is there even a way to do it?"

I started off by reading up on any previously found articles. I found a Stack Overflow question on how to un-shorten URLs using Python. The answer pointed to the requests library, using requests.head, setting allow_redirects to True. requests does not function with async.io at all. Which is where I found a question based on Async requests with Python requests (found here)

This question pointed to grequests, which is an async version of requests, however, when I attempted the code from the first question, replacing requests with grequests, it did not show the link location after re-directs. I then changed the .head to .get, and while it did work, it still provided the bit.ly URL I was using, rather than the un-shortened URL.

I am unsure what I could use to find the URL location after unshortening without making it synchronous rather than async. If anyone can help, that would be really useful!

Generic Nerd
  • 308
  • 1
  • 7
  • 19

2 Answers2

1

A good library that I would recommend using is aiohttp, a library which allows for asynchronous web requests.

damaredayo
  • 1,048
  • 6
  • 19
-3

Try this and then run it as a loop on your data frame using .apply(lambda) :

import requests

def unshortenurlx(url):
    try:
        response = requests.get(url)
        return(response.url)
    except Exception as e:
        return('Bad url {url}. {e}'.format(url=url, e=e))
Mtrinidad
  • 157
  • 1
  • 11