Printing the html from one page doen't show all the source page with python and urllib2

Question

i'm trying to read on specific page from amazon.

req = urllib2.Request('http://www.amazon.com/Upright-Citizens-Brigade-Comedy-Improvisation/dp/0989387801/ref=lp_1_1_6/175-0367440-7496156?ie=UTF8&qid=1376827779&sr=1-6%20buybox._V181901516_.png)%20center%20top%20no-repeat;')
req.add_header('User-agent', 'Mozilla/5.0\
            (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko)\
            Chrome/23.0.1271.97 Safari/537.11')
response=urllib2.urlopen(req)
html = response.read()
print html

i'm trying to read the price from a new item "$25.00" that's showed in the source code of the page but that part doesn't show in the html print. what i'm doing wrong?

You should be able to replace the current request url with this: `http://www.amazon.com/Upright-Citizens-Brigade-Comedy-Improvisation/dp/0989387801/`, then just parse the html to find the price. There are a number of helpful answers here: [Parsing HTML Python](http://stackoverflow.com/questions/11709079/parsing-html-python). — Joe, Aug 18 '13 at 13:48

score 2 · Answer 1 · edited May 23 '17 at 12:28

You should use an html parser, like lxml or BeautifulSoup. Here's an example using lxml:

parser = etree.HTMLParser()
root = etree.fromstring(html, parser=parser)

print root.xpath('//td[@class="a-text-right dp-new-col"]/a/span/text()')[0]

prints:

$25.00

Note, that the required tag and it's value was found using xpath expression:

XPath, the XML Path Language, is a query language for selecting nodes from an XML document.

Also see:

Hope that helps.

Printing the html from one page doen't show all the source page with python and urllib2

1 Answers1