Scraping: How to fetch an attribute in a tag

Question

I am using lxml and python to scrape through a page. The link to the page is HERE. The hiccup I face right now is how to fetch the attribute in the tag. For example the 3 Gold stars at the top of the page, they have a html

<abbr title="3" class="average rating large star3">★★★☆☆</abbr>

Here I want to fetch the title so that I know how many stars did this location get.

I have tried doing a couple of things including this:

response = urllib.urlopen('http://www.insiderpages.com/b/3721895833/central-kia-of-irving-irving').read()
mo = re.search(r'<div class="rating_box">.*?</div>', response)
div = html.fromstring(mo.group(0))
title = div.find("abbr").attrib["title"]
print title

But does not work for me. Help would be appreciated.

score 3 · Accepted Answer · edited May 23 '17 at 12:20

3

Don't use regex to extract data from html. You have lxml, use it's power (XPath).

>>> import lxml.html as html
>>> page = html.parse("http://www.insiderpages.com/b/3721895833/central-kia-of-irving-irving")
>>> print page.xpath("//div[@class='rating_box']/abbr/@title")
['3']

edited May 23 '17 at 12:20

Community

1
1

answered Apr 13 '12 at 06:50

Avaris

35,883
7
81
72

2

Yours is better. I didn't know lxml can fetch pages itself. – WooParadog Apr 13 '12 at 06:53

score 1 · Answer 2 · answered Apr 13 '12 at 06:52

Have you tried xpath?

In [38]: from lxml import etree

In [39]: import urllib2

In [40]: html = etree.fromstring(urllib2.urlopen('http://www.insiderpages.com/b/3721895833/central-kia-of-irving-irving').read(), parser)

In [41]: html.xpath('//abbr')[0].xpath('./@title')
Out[41]: ['3']

Scraping: How to fetch an attribute in a tag

2 Answers2