0

I am trying to pull stat tables from NHL.com and convert them to csv for later use in excel. I am able to pull the tables but am having issues converting them to csv. I have found plenty of previous questions about converting json to csv but none of the solutions have worked for me. Some of the solutions utilized pandas, which for some reason keeps giving me a traceback error. Here is the code up until the conversion to csv.

import requests
import lxml.html
from pprint import pprint 
from sys import exit
import json
import csv
import datetime 
import dateutil.relativedelta


now = datetime.datetime.now()
one_month_ago = now + dateutil.relativedelta.relativedelta(months=-15)

today_date = now.strftime('%Y-%m-%d')
one_month_ago_date = one_month_ago.strftime('%Y-%m-%d')

url = 'http://www.nhl.com/stats/rest/individual/skaters/basic/game/skatersummary?cayenneExp=gameDate%3E=%22'+one_month_ago_date+'T04:00:00.000Z%22%20and%20gameDate%3C=%22'+today_date+'T03:59:59.999Z%22%20and%20gameLocationCode=%22H%22%20and%20gameTypeId=%222%22&factCayenneExp=shots%3E=1&sort=[{%22property%22:%22points%22,%22direction%22:%22DESC%22},{%22property%22:%22goals%22,%22direction%22:%22DESC%22},{%22property%22:%22assists%22,%22direction%22:%22DESC%22}]'
resp = requests.get(url).text
resp = json.loads(resp)

Any help would be greatly appreciated!

Edit: Some of the csv conversion methods I tried included the highest rated answers from How can I convert JSON to CSV?. I was having issues pasting and formatting them here so I just provided the link instead.

This is my output when trying to use pandas.

Traceback (most recent call last):
File "NHL Data Scrape.py", line 1, in <module>
from pandas.io.json import json_normalize

File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\pandas\__init__.py", line 13, in <module>
__import__(dependency)

File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\__init__.py", line 142, in <module>
from . import add_newdocs

File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\add_newdocs.py", line 13, in <module>
from numpy.lib import add_newdoc

File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\lib\__init__.py", line 8, in <module>
from .type_check import *

File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\lib\type_check.py", line 11, in <module>
import numpy.core.numeric as _nx

File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\core\__init__.py", line 35, in <module>
from . import _internal  # for freeze programs

File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\core\_internal.py", line 18, in <module>
from .numerictypes import object_

File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\core\numerictypes.py", line 962, in <module>
_register_types()

File "C:\Users\Brett\AppData\Local\Programs\Python\Python36\lib\site-
packages\numpy\core\numerictypes.py", line 958, in _register_types
    numbers.Integral.register(integer)

AttributeError: module 'numbers' has no attribute 'Integral'


------------------
(program exited with code: 1)

Press any key to continue . . .
Brett Ford
  • 89
  • 1
  • 10
  • 1
    What have you tried in terms of conversion to CSV? You could also provide small samples of the JSON and your expected result (the CSV), which would make answering a lot easier! – akoeltringer Aug 31 '17 at 04:54
  • what version of pandas are you using? and what happens when you simply run `import pandas`? – MattR Aug 31 '17 at 14:48
  • I am using pandas 0.20.3, which I installed via pip. I just reinstalled Python and all of the packages to see if that would help and it didn't change anything. Also, when I simply use "import pandas" I get the same error – Brett Ford Aug 31 '17 at 15:03

2 Answers2

0

You can use json_normalize() from pandas.io.json, e.g.:

In []:
from pandas.io.json import json_normalize

...
resp = requests.get(url).json()
json_normalize(resp, 'data')

Out[]:
     assists  faceoffWinPctg  gameWinningGoals  gamesPlayed  goals  otGoals   ...
0         31          0.0967                 2           41     20        1   ...
1         27          0.0000                 3           38     22        0   ...
2         35          0.5249                 4           41     14        2   ...
3         34          0.4866                 3           41     14        1   ...
...
AChampion
  • 29,683
  • 4
  • 59
  • 75
  • This formatting is exactly how I would like it to look! The only problem is when I try to run it I get a long traceback error that first references "from pandas.io.json import json_normalize" then gives the file location of _init_.py in the pandas folder then does the same for various numpy modules as well. I have them both installed and located in the Python36 folder so I'm unsure why it does that. I'm going to work on trying to resolve this issue. – Brett Ford Aug 31 '17 at 14:05
  • Thank you for this respone though! If I can get pandas to work in the future I will fool around with it because it seems to be super useful and simple. – Brett Ford Aug 31 '17 at 16:21
  • Looks like you have a bad installation of numpy. I would try reinstalling it. – AChampion Aug 31 '17 at 16:25
0

You could utilize python's built-in csv.DictWriter

resp = requests.get(url).json() # get response data in json

# resp['data'] is a list of dicts which contains players info.
# resp['data'][0].keys() is a dictionary keys. We'll use it for csv header.
with open('nhl_players.csv', 'w') as f:
    w = csv.DictWriter(f, resp['data'][0].keys())
    w.writeheader()
    w.writerows(resp['data'])

Here's the output CSV file https://www.dropbox.com/s/1mmprenx0eniflg/nhl_players.csv?dl=0

Hope this helps.

Andrey
  • 111
  • 1
  • 1
  • 3