2

In Python I'm parsing various URLs to find some elements in the body of the returned document. I'm using lxml for this, like so:

import lxml.html as html

url = 'http://www.linktowebsite.com'
data = html.parse(url)

for d in data.xpath('body'):
    code code code

Some urls however redirect to a different page and I want to know the current URL after the redirect. I haven't found anything in the documentation of lxml about this.

How can I find the current URL of the parsed/redirected page?

Roland
  • 240
  • 2
  • 7
  • 15

1 Answers1

4

Use data.docinfo.URL documentation

Example:

In [22]: data = html.parse('http://httpbin.org/redirect/2')

In [23]: data.docinfo.URL
Out[23]: u'http://httpbin.org/get'
reclosedev
  • 9,352
  • 34
  • 51