I've got html that contains entries like this:
<div class="entry">
<h3 class="foo">
<a href="http://www.example.com/blog-entry-slug"
rel="bookmark">Blog Entry</a>
</h3>
...
</div>
and I would like to extract the text "Blog Entry" (and a number of other attributes, so I'm looking for a generic answer).
In jQuery, I would do
$('.entry a[rel=bookmark]').text()
the closest I've been able to get in Python is:
from BeautifulSoup import BeautifulSoup
import soupselect as soup
rawsoup = BeautifulSoup(open('fname.html').read())
for entry in rawsoup.findAll('div', 'entry'):
print soup.select(entry, 'a[rel=bookmark]')[0].string.strip()
soupselect from http://code.google.com/p/soupselect/.
Soupselect doesn't understand the full CSS3 selector syntax, like jQuery does however. Is there such a beast in Python?