How do I obtain redirected URLs?

Question

I am trying to get the redirected URL that https://trade.ec.europa.eu/doclib/html/153814.htm leads to (a pdf file).

I've so far tried

r = requests.get('https://trade.ec.europa.eu/doclib/html/153814.htm', allow_redirects = True)
print(r.url)

and it outputs the same old URL. I need the redirected URL which is https://trade.ec.europa.eu/doclib/docs/2015/september/tradoc_153814.pdf

You can use CURL to follow redirects. https://davidwalsh.name/curl-follow-redirects — Anonymous, Jun 11 '21 at 04:45
Hope you have installed the package `requests` . For example, on MS-DOS, use the command prompt command `\python.exe -m pip install requests` to install the package `requests`. — Raky, Jun 11 '21 at 04:46
This has been answered at this link https://stackoverflow.com/questions/23146961/working-with-a-pdf-from-the-web-directly-in-python — Raky, Jun 11 '21 at 06:40

score 1 · Answer 1 · answered Jun 11 '21 at 07:00

Please try this code to see if it works for you

import urllib.request
import re
import requests
import PyPDF2
import io
from requests_html import HTMLSession
from urllib.parse import urlparse
from PyPDF2 import PdfFileReader
 
# Get Domain Name With urlparse
url = "https://trade.ec.europa.eu/doclib/html/153814.htm"
parsed_url = urlparse(url)
domain = parsed_url.scheme + "://" + parsed_url.netloc
 
# Get URL 
session = HTMLSession()
r = session.get(url)
 
# Extract Links
jlinks = r.html.xpath('//a/@href')
 
# Remove bad links and replace relative path for absolute path
updated_links = []
 
for link in jlinks:
    if re.search(".*@.*|.*javascript:.*|.*tel:.*",link):
        link = ""
    elif re.search("^(?!http).*",link):
        link = domain + link
        updated_links.append(link)
    else:
        updated_links.append(link)
r = requests.get(updated_links[0])
f = io.BytesIO(r.content)
reader = PdfFileReader(f)
contents = reader.getPage(0).extractText() 
print(contents)

This was going to be my next query. Thank you for helping me obtain the doc contents. — piñatabreaker, Jun 12 '21 at 01:49
But you have clicked the Tick Mark under the Vote button for another response ;) — Raky, Jun 12 '21 at 05:02

score 0 · Accepted Answer · answered Jun 11 '21 at 06:54

I think you should get a redirect link yourself (didn't found any way to do this with redirect), when you enter https://trade.ec.europa.eu/doclib/html/153814.htm it gives you HTML page with a redirect link, as for example you can extract it like this

import requests
from lxml import etree, html

tree = html.fromstring(requests.get('https://trade.ec.europa.eu/doclib/html/153814.htm').text)
print(tree.xpath('.//a/@href')[0])

Output will be

https://trade.ec.europa.eu/doclib/docs/2015/september/tradoc_153814.pdf

How do I obtain redirected URLs?

2 Answers2

Linked