I'm running some web scraping, and now have a list of 911 links saved in the following (I included 5 to demonstrate how they're stored):
every_link = ['http://www.millercenter.org/president/obama/speeches/speech-4427', 'http://www.millercenter.org/president/obama/speeches/speech-4425', 'http://www.millercenter.org/president/obama/speeches/speech-4424', 'http://www.millercenter.org/president/obama/speeches/speech-4423', 'http://www.millercenter.org/president/obama/speeches/speech-4453']
These URLs link to presidential speeches over time. I want to store each individual speech (so, 911 unique speeches) in different text files, or be able to group by president. I'm trying to pass the following function on to these links:
def processURL(l):
open_url = urllib2.urlopen(l).read()
item_soup = BeautifulSoup(open_url)
item_div = item_soup.find('div',{'id': 'transcript'},{'class': 'displaytext'})
item_str = item_div.text.lower()
item_str_processed = punctuation.sub('',item_str)
item_str_processed_final = item_str_processed.replace('—',' ')
for l in every_link:
processURL(l)
So, I would want to save to unique text files words from the all the processed speeches. This might look like the following, with obama_44xx
representing individual text files:
obama_4427 = "blah blah blah"
obama_4425 = "blah blah blah"
obama_4424 = "blah blah blah"
...
I'm trying the following:
for l in every_link:
processURL(l)
obama.write(processURL(l))
But that's not working... Is there another way I should go about this?