I'm trying to write multiple rows in to a CSV file using python and I've been working on this code for a while to piece together how to do this. My goal here is simply to use the oxford dictionary website, and web-scrape the year and words created for each year into a csv file. I want each row to start with the year I'm searching for and then list all the words across horizontally. Then, I want to be able to repeat this for multiple years.
Here's my code so far:
import requests
import re
import urllib2
import os
import csv
year_search = 1550
subject_search = ['Law']
path = '/Applications/Python 3.5/Economic'
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)
user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
header = {'User-Agent':user_agent}
request = urllib2.Request('http://www.oed.com/', None, header)
f = opener.open(request)
data = f.read()
f.close()
print 'database first access was successful'
resultPath = os.path.join(path, 'OED_table.csv')
htmlPath = os.path.join(path, 'OED.html')
outputw = open(resultPath, 'w')
outputh = open(htmlPath, 'w')
request = urllib2.Request('http://www.oed.com/search?browseType=sortAlpha&case-insensitive=true&dateFilter='+str(year_search)+'&nearDistance=1&ordered=false&page=1&pageSize=100&scope=ENTRY&sort=entry&subjectClass='+str(subject_search)+'&type=dictionarysearch', None, header)
page = opener.open(request)
urlpage = page.read()
outputh.write(urlpage)
new_word = re.findall(r'<span class=\"hwSect\"><span class=\"hw\">(.*?)</span>', urlpage)
print str(new_word)
outputw.write(str(new_word))
page.close()
outputw.close()
This outputs my string of words that were identified for the year 1550. Then I tried to make code write to a csv file on my computer, which it does, but I want to do two things that I'm messing up here:
- I want to be able to insert multiple rows into this and
- I want to have the year show up in the first spot
Next part of my code:
with open('OED_table.csv', 'w') as csvfile:
fieldnames = ['year_search']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'year_search': new_word})
I was using the csv
module's online documentation as a reference for the second part of the code.
And just to clarify, I included the first part of the code in order to give perspective.