I am looking for a faster way to load data from my json object into a multiindex dataframe.
My JSON is like:
{
"1990-1991": {
"Cleveland": {
"salary": "$14,403,000",
"players": {
"Hot Rod Williams": "$3,785,000",
"Danny Ferry": "$2,640,000",
"Mark Price": "$1,400,000",
"Brad Daugherty": "$1,320,000",
"Larry Nance": "$1,260,000",
"Chucky Brown": "$630,000",
"Steve Kerr": "$548,000",
"Derrick Chievous": "$525,000",
"Winston Bennett": "$525,000",
"John Morton": "$350,000",
"Milos Babic": "$200,000",
"Gerald Paddio": "$120,000",
"Darnell Valentine": "$100,000",
"Henry James": "$75,000"
},
"url": "https://hoopshype.com/salaries/cleveland_cavaliers/1990-1991/"
},
I am making the dataframe like:
df = pd.DataFrame(columns=["year", "team", "player", "salary"])
for year in nbaSalaryData.keys():
for team in nbaSalaryData[year]:
for player in nbaSalaryData[year][team]['players']:
df = df.append({
"year": year,
"team": team,
"player": player,
"salary": nbaSalaryData[year][team]['players'][player]
}, ignore_index=True)
df = df.set_index(['year', 'team', 'player']).sort_index()
df
Which results in:
salary
year team player
1990-1991 Atlanta Doc Rivers $895,000
Dominique Wilkins $2,065,000
Gary Leonard $200,000
John Battle $590,000
Kevin Willis $685,000
... ... ... ...
2020-2021 Washington Robin Lopez $7,300,000
Rui Hachimura $4,692,840
Russell Westbrook $41,358,814
Thomas Bryant $8,333,333
Troy Brown $3,372,840
This is the form I want - year, team, and player as indexes and salary as a column. I know using append is slow but I cannot figure out an alternative. I tried to make it using tuples (with a slightly different configuration - no players and salary) but it ended up not working.
tuples = []
index = None
for year in nbaSalaryData.keys():
for team in nbaSalaryData[year]:
t = nbaSalaryData[year][team]
tuples.append((year, team))
index = pd.MultiIndex.from_tuples(tuples, names=["year", "team"])
df = index.to_frame()
df
Which outputs:
year team
year team
1990-1991 Cleveland 1990-1991 Cleveland
New York 1990-1991 New York
Detroit 1990-1991 Detroit
LA Lakers 1990-1991 LA Lakers
Atlanta 1990-1991 Atlanta
I'm not that familiar with pandas but realize there must be a faster way than append()
.