1

Through some searching I was able to figure out that what I was trying to scrape was inside of an iframe. Which was the main reason I always recieved None back as my results. I was able to start pulling in some data like the headers but when it comes to the data within the table i can only get the first result which is the number 1. Here is the code:

from bs4 import BeautifulSoup
from selenium import webdriver  
from selenium.common.exceptions import NoSuchElementException  
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get('http://www.nhl.com/stats/player?aggregate=1&reportType=game&dateFrom=2017-10-20&dateTo=2017-10-31&filter=gamesPlayed,gte,1&sort=shots')
html = driver.page_source
driver.quit()
soup = BeautifulSoup(html,"html.parser")

stat_cat = soup.find('div',attrs={'class':'rt-tr'})
header = stat_cat.text.strip()

stats = soup.find('div',attrs={'class':'rt-td'})
player_stats = stats.text.strip()

print(header,player_stats)

What I am trying to figure out is how to get the Player and his stats scraped from the second soup.find but it only returns the first rt-td result. Once I have all of the data I would then like to not just print it but to save it to a csv. Thanks for taking a look!

Michael T Johnson
  • 679
  • 2
  • 13
  • 26

1 Answers1

1

Give it a try. If you wanna get all the data from that table, you can have it running the script.

import csv
import requests

outfile = open("table_data.csv","a",newline='')
writer = csv.writer(outfile)
writer.writerow(["n","m","y","u"])

req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=true&reportType=basic&isGame=true&reportName=skatersummary&sort=[{%22property%22:%22shots%22,%22direction%22:%22DESC%22}]&cayenneExp=gameDate%3E=%222017-10-20%22%20and%20gameDate%3C=%222017-10-31%22%20and%20gameTypeId=2') 
data = req.json()['data']
for item in data:
    Player = item['playerName']
    Pos = item['playerPositionCode']
    GP = item['gamesPlayed']
    G = item['goals']
    A = item['assists']
    P = item['points']
    Plus_Minus = item['plusMinus']
    PIM = item['penaltyMinutes']
    PPG = item['ppGoals']
    PPP = item['ppPoints']
    SHG = item['shGoals']
    SHP = item['shPoints']
    GWG = item['gameWinningGoals']
    OTG = item['otGoals']
    S_down = item['shots']
    S_per = item['shootingPctg']
    TOI = item['timeOnIcePerGame']
    Shifts = item['shiftsPerGame']
    FOW = item['faceoffWinPctg']
    print(Player,Pos,GP,G,A,P,Plus_Minus,PIM,PPG,PPP,SHG,SHP,GWG,OTG,S_down,S_per,TOI,Shifts,FOW)

    writer.writerow([Player,Pos,GP,G,A,P,Plus_Minus,PIM,PPG,PPP,SHG,SHP,GWG,OTG,S_down,S_per,TOI,Shifts,FOW])
outfile.close()

Partial results:

Brent Burns D 6 0 5 5 -3 4 0 3 0 0 0 0 31 0.0 1458.8333 29.0 0.0
Max Pacioretty L 5 3 0 3 0 4 0 0 1 1 0 0 29 0.1034 1240.8 26.2 0.0
Phil Kessel R 6 2 4 6 -1 4 0 4 0 0 2 2 27 0.074 1044.3333 21.5 0.3333
Jakub Voracek R 5 2 4 6 2 8 0 0 0 0 0 0 26 0.0769 1191.2 25.4 1.0
John Carlson D 5 0 3 3 -3 2 0 1 0 0 0 0 25 0.0 1686.2 29.4 0.0
Evgeny Kuznetsov C 5 3 1 4 -1 6 0 1 0 0 1 0 24 0.125 1138.4 20.2 0.3703
SIM
  • 21,997
  • 5
  • 37
  • 109
  • Thanks Shahin, It seems to get most of the data but for some reason if we look closer for example Brent Burns has 5 assists and 5 points but only one of the 5's shows up? and upon export the players name gets split up and his last name becomes the Pos(position). I am curious if there is a solution for that and to also add a 0 or nil in columns without data? – Michael T Johnson Nov 02 '17 at 19:03
  • Also this grabs the 50 on the first page and there are 650 within this table. The link does not change between the 1st or the 7th page otherwise i would just change the link. How would we go about getting the other 600 players from this table? – Michael T Johnson Nov 02 '17 at 19:05
  • So you wanna get all the new questions to be answered as well in a single thread. Go through your post and think again what your first requirement was. – SIM Nov 02 '17 at 20:05
  • Yes you did answer my original question, thank you. Just looking for further guidance. – Michael T Johnson Nov 02 '17 at 20:27
  • Kinda busy at this moment. Try to fetch you 650 data as well whenever I'm free. – SIM Nov 02 '17 at 20:31
  • No rush Shahin, you've been very helpful. Would you like me to open a new thread for that for your anwser? – Michael T Johnson Nov 02 '17 at 20:32
  • No need for that. – SIM Nov 02 '17 at 20:34
  • See the edited part. This is the easiest way. However, I'll take a look to traverse 13 pages using selenium applying click on next page links which, i suppose, you expect to stick to. – SIM Nov 02 '17 at 21:41
  • No i don't have to, that works great! Thanks! Last thing is how i would put this to a csv file? – Michael T Johnson Nov 02 '17 at 21:53
  • See the edited code. Run it and get a csv output with data filled in. By the way, i didn't write the header. I just commented out that portion for you. All you need to do is write 19 fields manually within comma separated inverted comma like how i started. However, if you run now you can get all the data excluding headers. Hope you find it useful. Thanks. – SIM Nov 02 '17 at 22:33
  • How would it work if i only wanted the Header 1 time instead of it writing everyother row? Also how did you find the Json version of that webpage, very curious cause that was impressive. – Michael T Johnson Nov 04 '17 at 18:59
  • Found the json link using chrome dev tools which is very necessary to make use of when it comes to see the different activity going on while sending or receiving http requests. For writing headers: the pattern you should follow is defining everything before the loop starts and then using `writer.writerow([data])` to append data in it something like https://stackoverflow.com/questions/46026399/export-data-from-beautifulsoup-to-csv/46028672#46028672 – SIM Nov 04 '17 at 19:21
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/158232/discussion-between-michael-t-johnson-and-shahin). – Michael T Johnson Nov 04 '17 at 20:38