My question is similar to the one asked here: https://stackoverflow.com/questions/14599485/news-website-comment-analysis I am trying to extract comments from any news article. E.g. i have a news url here: http://www.cnn.com/2013/09/24/politics/un-obama-foreign-policy/ I am trying to use BeautifulSoup in python to extract the comments. However it seems the comment section is either embedded within an iframe or loaded through javascript. Viewing the source through firebug does not reveal the source of the comments section. But explicitly viewing the source of the comments through view-source feature of the browser does. How to go about extracting the comments, especially when the comments come from a different url embedded within the news web-page?
This is what i have done till now although this is not much:
import urllib2
from bs4 import BeautifulSoup
opener = urllib2.build_opener()
url = ('http://www.cnn.com/2013/08/28/health/stem-cell-brain/index.html')
urlContent = opener.open(url).read()
soup = BeautifulSoup(urlContent)
title = soup.title.text
print title
body = soup.findAll('body')
outfile = open("brain.txt","w+")
for i in body:
i=i.text.encode('ascii','ignore')
outfile.write(i +'\n')
Any help in what I need to do or how to go about it will be much appreciated.