A friend and I are trying to calculate some comment metrics (how many users comment on certain posts, who comments, how many comments does each user add, etc.) at a baseball blog that we frequent.
I know little to nothing about web programming or scraping, but I know a bit of Python so I volunteered to help (she was copying and pasting comments into .txt files and using Cmd + F to tally up comments).
My initial approach has utilized urllib2 and BeautifulSoup (Python 2.7):
import sys,re,csv,glob,os
from collections import Counter
import urllib2
from bs4 import BeautifulSoup
url = "http://www.royalsreview.com/2016/6/8/11881484/an-analysis-of-rr-game-threads#comments"
f = urllib2.urlopen(url).read()
soup = BeautifulSoup(f)
userlist = soup.find_all("div", class_="comment")
I sort of know what I'm looking for by going to the URL in a Chrome browser and clicking "Inspect" on a comment, which shows me the HTML bit of what I need to tally up comments.
However, when I use urllib2 to read the URL, the HTML that it pulls does not include the comments on that webpage.
From my research, I think it's because urllib2 will get the page's source from the server, but it won't include the content generated by JavaScript (I'm venturing from my comfortable place, here) or whatever (eg. the comments).
How can I get the page AFTER users have changed it by adding comments?
Thanks for the help