I am trying to scrape the people who have birthdays from this Wikipedia page
Here is the existing code:
hdr = {'User-Agent': 'Mozilla/5.0'}
site = "http://en.wikipedia.org/wiki/"+"january"+"_"+"1"
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
print soup
This all works fine and I get the entire HTML page, but I want specific data, and I don't know how to access that with Beautiful Soup without an id to use. The <ul>
tag does not have an id and neither do the <li>
tags. Plus, I can't just ask for every <li>
tag because there are other lists on the page. Is there a specific way to call a given list? (I can't just use a fix for this one page because I plan on iterating through all the dates and getting every pages birthday, and I can't guarentee that every page is the exact same layout as this one).
`, each `- ` corresponds to one person (until the next heading).
– Dr. Jan-Philip Gehrcke Jul 16 '13 at 17:48