BeautifulSoup web scrape will not correctly scrape the data from a given column of a table.
It works to get (scrape) all the data in the table EXCEPT for the data in the 'Player' column; the output shows all the player names as 'none'.
The only difference in the td element for the data in the 'player' column vs. all other td elements in the tr is that there is a href before the 'td' in the player data element, as displayed in the images below.
How would i go about changing my code to get the players names? Is it the href in the for the 'Player' data that is screwing my script? If so, how do i account for this?
#HOME_SKATERS
#FIRST_TWO_GAMES
import requests
from bs4 import BeautifulSoup
import pandas as pd
import csv
table = []
df = pd.DataFrame()
for i in range (400959564,400959565):
url = requests.get("http://www.espn.com/nhl/boxscore?gameId={}".format(i))
if not url.ok:
continue
data = url.text
soup = BeautifulSoup(data, 'lxml')
#Add the game ID to the list of soups to keep track of multiple players with same game ID
table.append((i,soup.find_all('table', {'class' : 'mod-data'})[5].find_all('tr')[2:20]))
data = []
soups = []
game_id = []
for i,t in table:
#Use .contents method to turn the soup into list of items
soups = [j.contents for j in t]
for s in soups:
#Use .string method to parse the values of different columns
data.append([a.string for a in s])
#Append the Game ID
game_id.append(i)
#Create a DataFrame from the data extracted
df = pd.DataFrame(data)
df.columns = ['Player', 'G', 'A','Plus_Minus', 'SOG', 'MS', 'BS', 'PN', 'PIM', 'HT', 'TK', 'GV', 'SHF', 'TOT', 'PP','SH', 'EV', 'FW', 'FL', 'Faceoff_Pct']
df['Game ID'] = game_id
#df.to_csv('HOME_SKATERS.csv')
df