so I'm trying to parse public facebook pages using BeautifulSoup. I've managed to successfully scrape LinkedIn, but I've spent hours trying to get it to work on facebook with no luck. The code I'm trying to use looks like this:
for urls in my_urls:
try:
page = urllib2.urlopen(urls)
soup = BeautifulSoup(page)
info = soup.find_all("div", class_="fsl fwb fcb")
info2 = info.findall('a')
The part that's frustrating me is that I can get the title element out, and I can even get pretty far down the document, but I can't get to the part where I need to get.
This line successfuly grabs the pageTitle:
info = soup.find_all("title", attrs={"id": "pageTitle"})
This line can get pretty far down the list of elements, but can't go any farther.
info = soup.find_all(id="pagelet_timeline_main_column")
Here's a sample page that I'm trying to parse, I want current city from it:
https://www.facebook.com/100004210542493
and heres a quick screenshot of what the part I want looks like:
I feel like I'm really close, but I just can't figure it out. Thanks in advance for any help!
EDIT 2: I should also mention that I can successfully print the whole soup and visually find the part I need, but for whatever reason the parsing just won't work the way it should.