I got advice from Jamie Bull and PM 2Ring to use the CSV module for the output of my web scraper . I'm nearly done but have an issue with some parsed items that are separated by a colon or hyphen. I'm wanting those items split into two items in the current list.
Current output:
GB,16,19,255,1,26:40,19,13,4,2,6-12,0-1,255,57,4.5,80,21,3.8,175,23-33,4.9,3,14,1,4,38.3,8,65,1,0 Sea,36,25,398,1,33:20,25,8,13,4,4-11,1-1,398,66,6.0,207,37,5.6,191,19-28,6.6,1,0,0,2,33.0,4,69,2,1
Desired output:(The issues/differences are in bold)
GB,16,19,255,1,26,40,19,13,4,2,6,12,0,1,255,57,4.5,80,21,3.8,175,23,33,4.9,3,14,1,4,38.3,8,65,1,0 Sea,36,25,398,1,33,20,25,8,13,4,4,11,1,1,398,66,6,207,37,5.6,191,19,28,6.6,1,0,0,2,33,4,69,2,1
I am unsure where or how to make these changes. I also don't know if regex is needed. Obviously I could handle this in notepad or Excel but my goal is to handle all this in Python.
If you run the program, the above results are from the 2014 season, week 1.
import requests
import re
from bs4 import BeautifulSoup
import csv
year_entry = raw_input("Enter year: ")
week_entry = raw_input("Enter week number: ")
week_link = requests.get("http://sports.yahoo.com/nfl/scoreboard/?week=" + week_entry + "&phase=2&season=" + year_entry)
page_content = BeautifulSoup(week_link.content)
a_links = page_content.find_all('tr', {'class': 'game link'})
csvfile = open('NFL_2014.csv', 'a')
writer = csv.writer(csvfile)
for link in a_links:
r = 'http://www.sports.yahoo.com' + str(link.attrs['data-url'])
r_get = requests.get(r)
soup = BeautifulSoup(r_get.content)
stats = soup.find_all("td", {'class':'stat-value'})
teams = soup.find_all("th", {'class':'stat-value'})
scores = soup.find_all('dd', {"class": 'score'})
try:
away_game_stats = []
home_game_stats = []
statistic = []
game_score = scores[-1]
game_score = game_score.text
x = game_score.split(" ")
away_score = x[1]
home_score = x[4]
home_team = teams[1]
away_team = teams[0]
away_team_stats = stats[0::2]
home_team_stats = stats[1::2]
away_game_stats.append(away_team.text)
away_game_stats.append(away_score)
home_game_stats.append(home_team.text)
home_game_stats.append(home_score)
for stats in away_team_stats:
text = stats.text.strip("").encode('utf-8')
away_game_stats.append(text)
writer.writerow(away_game_stats)
for stats in home_team_stats:
text = stats.text.strip("").encode('utf-8')
home_game_stats.append(text)
writer.writerow(home_game_stats)
except:
pass
csvfile.close()
Any help is greatly appreciated. This is my first program and searching this board has been a great resource.
Thanks,
JT