How to get current url of a parsed HTML page in Python with lxml?

Question

In Python I'm parsing various URLs to find some elements in the body of the returned document. I'm using lxml for this, like so:

import lxml.html as html

url = 'http://www.linktowebsite.com'
data = html.parse(url)

for d in data.xpath('body'):
    code code code

Some urls however redirect to a different page and I want to know the current URL after the redirect. I haven't found anything in the documentation of lxml about this.

How can I find the current URL of the parsed/redirected page?

Take a look here http://stackoverflow.com/q/4902523/776084 – RanRag Dec 30 '11 at 15:57 — RanRag, Dec 30 '11 at 15:57

score 4 · Accepted Answer · answered Dec 30 '11 at 16:31

4

Use data.docinfo.URL documentation

Example:

In [22]: data = html.parse('http://httpbin.org/redirect/2')

In [23]: data.docinfo.URL
Out[23]: u'http://httpbin.org/get'

answered Dec 30 '11 at 16:31

reclosedev

9,352
34
51

How to get current url of a parsed HTML page in Python with lxml?

1 Answers1