0

Given 3 raw and old urls:

raw_url = 'https://digi.kansalliskirjasto.fi/aikakausi/binding/498491?term=1864&term=SUOMI&language=fi'
raw_url = 'https://twitter.com/i/user/2274951674'
raw_url = 'https://youtu.be/dQw4w9WgXcQ'

Using this code snippet in my Linux machine (different question than this post) containing both GET and HEAD methods in requests library to obtain the updated urls:

#r = requests.get(raw_url)
r = requests.head(raw_url, allow_redirects=True)
r.raise_for_status()

print(f"HTTP family: {r.status_code}\tExists: {r.ok}\thistory:{r.history}")

updated_url = r.url
print(f"Updated URL: {updated_url}") # works only for 3rd raw_url

It seems that it only redirects and updates those urls with <Response [3XX]> (my third raw_url) not others.

My updated and expected urls in a web browser are:

https://digi.kansalliskirjasto.fi/aikakausi/binding/498491?term=1864&term=SUOMI&page=1
https://twitter.com/ozanbayram01
https://www.youtube.com/watch?v=dQw4w9WgXcQ # still different from requests updated url

How can I get updated urls in python in such scenarios?

Cheers,

Farid Alijani
  • 839
  • 1
  • 7
  • 25
  • I guess that twitter does not redirect using http redirect (3xx) but with javascript. If you want to follow js redirection you will need a headless browser with js. Not something requests can do. – luxcem Dec 07 '22 at 14:21
  • I also tried `webbrowser.open(raw_url, new=2)` which opens the url in a web browser with updated url! but I can't print it in python – Farid Alijani Dec 07 '22 at 14:31

0 Answers0