Currently I am practicing on the basic concept of accessing web using python. I am following a tutorial on YouTube and was guided till the following code.
from urllib2 import urlopen, HTTPError
from BeautifulSoup import BeautifulSoup
import re
url="http://getbusinessreviews.org/"
try:
webpage = urlopen(url).read
except HTTPError, e:
if e.code == 404:
e.msg = 'data not found on remote: %s' % e.msg
raise
pathFinderTitle = re.compile('<h2 class="entry-title"><a href.* rel="bookmark">(.*)</a></h2>')
if webpage:
if pathFinderTitle:
findPathTitle = re.findall(pathFinderTitle,webpage)
else:
print "unable to get path finder title"
else:
print "unable to url open "
listIterator =[]
listIterator[:]= range(2,10)
for i in listIterator:
print findPathTitle[i]
i want to extract "Nutracoster" from the following HTML
<h2 class="entry-title">
<a href="http://getbusinessreviews.org/nutracoster/" rel="bookmark">Nutracoster</a>
</h2>
I've got two questions
I am getting no results at the moment can any one guide me what am I doing wrong?(I guess my regular expression is not well defined)
How can i pass this Regular expression to Beautifulsoup ?
Thanks in advance and sorry for any silly mistakes since i am at learning stage :D