-1

i'm trying to read on specific page from amazon.

req = urllib2.Request('http://www.amazon.com/Upright-Citizens-Brigade-Comedy-Improvisation/dp/0989387801/ref=lp_1_1_6/175-0367440-7496156?ie=UTF8&qid=1376827779&sr=1-6%20buybox._V181901516_.png)%20center%20top%20no-repeat;')
req.add_header('User-agent', 'Mozilla/5.0\
            (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko)\
            Chrome/23.0.1271.97 Safari/537.11')
response=urllib2.urlopen(req)
html = response.read()
print html

i'm trying to read the price from a new item "$25.00" that's showed in the source code of the page but that part doesn't show in the html print. what i'm doing wrong?

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
Findios
  • 307
  • 1
  • 4
  • 14
  • You should be able to replace the current request url with this: `http://www.amazon.com/Upright-Citizens-Brigade-Comedy-Improvisation/dp/0989387801/`, then just parse the html to find the price. There are a number of helpful answers here: [Parsing HTML Python](http://stackoverflow.com/questions/11709079/parsing-html-python). – Joe Aug 18 '13 at 13:48

1 Answers1

2

You should use an html parser, like lxml or BeautifulSoup. Here's an example using lxml:

parser = etree.HTMLParser()
root = etree.fromstring(html, parser=parser)

print root.xpath('//td[@class="a-text-right dp-new-col"]/a/span/text()')[0]

prints:

$25.00

Note, that the required tag and it's value was found using xpath expression:

XPath, the XML Path Language, is a query language for selecting nodes from an XML document.

Also see:

Hope that helps.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195