Shortened link not working with BeautifulSoup Python

Question

This code gets the information from the site perfectly fine:

url = 'https://www.vogue.com/article/mamma-mia-2-here-we-go-again-review?mbid=social_twitter'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'}
response = requests.get(url, headers=headers)

soup = BeautifulSoup(response.text, "lxml")

title = soup.find("meta",  {"name": "twitter:title"})
title2 = soup.find("meta",  property="og:title")
title3 = soup.find("meta",  property="og:description")

print("TITLE: "+str(title['content']))
print("TITLE2: "+str(title2['content']))
print("TITLE3: "+str(title3['content']))

However, when I replace the url with this shortened link it returns:

print("TITLE: "+str(title['content']))
TypeError: 'NoneType' object has no attribute '__getitem__'

Possible duplicate of [How can I unshorten a URL?](https://stackoverflow.com/questions/4201062/how-can-i-unshorten-a-url) — Mike Scotty, Aug 05 '18 at 07:26

score 1 · Answer 1 · answered Aug 05 '18 at 07:37

The url-shortener sends a meta-refresh to redirect to desired page. This code should help:

from bs4 import BeautifulSoup
import requests
import re

shortened_url = '<YOUR SHORTENED URL>'

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'}

response = requests.get(shortened_url, headers=headers)
soup = BeautifulSoup(response.text, "lxml")

while True:
    # is meta refresh there?
    if soup.select_one('meta[http-equiv=refresh]'):
        refresh_url = re.search(r'url=(.*)', soup.select_one('meta[http-equiv=refresh]')['content'], flags=re.I)[1]
        response = requests.get(refresh_url, headers=headers)
        soup = BeautifulSoup(response.text, "lxml")
    else:
        break

title = soup.find("meta",  {"name": "twitter:title"})
title2 = soup.find("meta",  property="og:title")
title3 = soup.find("meta",  property="og:description")

print("TITLE: "+str(title['content']))
print("TITLE2: "+str(title2['content']))
print("TITLE3: "+str(title3['content']))

Prints:

TITLE: Mamma Mia! Here We Go Again Is the Only Good Thing About This Summer - Vogue
TITLE2: Mamma Mia! Here We Go Again Is the Only Good Thing About This Summer
TITLE3: Is it possible to change your country of origin to a movie sequel?

Shortened link not working with BeautifulSoup Python

1 Answers1