For example, these 2 links point to the same location:
http://www.independent.co.uk/life-style/gadgets-and-tech/news/2292113.html
How do i check this in python?
For example, these 2 links point to the same location:
http://www.independent.co.uk/life-style/gadgets-and-tech/news/2292113.html
How do i check this in python?
Call geturl()
on the result of urllib2.urlopen()
. geturl()
"returns the URL of the resource retrieved, commonly used to determine if a redirect was followed."
For example:
#!/usr/bin/env python
# coding: utf-8
import urllib2
url1 = 'http://www.independent.co.uk/life-style/gadgets-and-tech/news/chinese-blamed-for-gmail-hacking-2292113.html'
url2 = 'http://www.independent.co.uk/life-style/gadgets-and-tech/news/2292113.html'
for url in [url1, url2]:
result = urllib2.urlopen(url)
print result.geturl()
The output is:
http://www.independent.co.uk/life-style/gadgets-and-tech/news/chinese-blamed-for-gmail-hacking-2292113.html
http://www.independent.co.uk/life-style/gadgets-and-tech/news/chinese-blamed-for-gmail-hacking-2292113.html
It's impossible to discern that merely from the URLs, obviously.
You could fetch the content and compare it, but then I imagine you'd have to use a smart criterion to decide when two pages are the same -- say, for example, that both point to the same article, but a random advertising comes different, or related articles change depending on other factors.
Design your program in such a way that the criterion for matching pages is easily replaced, even dynamically, and try until you find one that doesn't fail -- for example, for a newspaper page, you could try finding headlines.