I am trying to build a database containing play by play data for several seasons of NBA games, for my Msc. in economics dissertation. Currently I am extracting games from the NBA's API (see example) and splitting each game into a different .json file using this routine (duly adapted for p-b-p purposes), thus yielding .json files as (first play example):
{"headers": ["GAME_ID", "EVENTNUM", "EVENTMSGTYPE", "EVENTMSGACTIONTYPE", "PERIOD", "WCTIMESTRING", "PCTIMESTRING", "HOMEDESCRIPTION", "NEUTRALDESCRIPTION", "VISITORDESCRIPTION", "SCORE", "SCOREMARGIN"], "rowSet": [["0041400406", 0, 12, 0, 1, "9:11 PM", "12:00", null, null, null, null, null], ["0041400406", 1, 10, 0, 1, "9:11 PM", "12:00", "Jump Ball Mozgov vs. Green: Tip to Barnes", null, null, null, null]
I plan on creating a loop to convert all of the generated .json files to .csv, such that it allows me to proceed to econometric analysis in stata. At the moment, I am stuck in the first step of this procedure: the creation of the json to CSV conversion process (I will design the loop afterwards). The code I am trying is:
f = open('pbp_0041400406.json')
data = json.load(f)
f.close()
with open("pbp_0041400406.csv", "w") as file:
csv_file = csv.writer(file)
for rowSet in data:
csv_file.writerow(rowSet)
f.close()
However, the yielded CSV files are showing awkward results: one line reading h,e,a,d,e,r,s
and another reading r,o,w,S,e,t
, thus not capturing the headlines or rowSet(the plays themselves).
I have tried to solve this problem taking into account the contributes on this thread, but I have not been able to do it. Can anybody please provide me some insight into solving this problem?
[EDIT] Replacing rowset with data in the original code also yielded the same results.
Thanks in advance!