Nested list elements to data frame in Python

Question

Fair warning this question does require a non standard Python package, nba_api. I have a list with 3 elements with each element in the list containing another list with 2 elements: a player data frame and a team data frame. What is recommended way to achieve the following desired result: 1 combined player data frame and 1 combined team data frame? Coming from an R background, I would tackle this problem by: 1. joining the players data frame with the team data frame into joined_list then, 2. using do.call(rbind, joined_list) to row bind the results into one data frame. I understand this might be very elementary to a lot of experienced Python users but I'm having a hell of a time trying to find the right approach to this after many searches on here.

import nba_api
import requests
import pandas as pd

from nba_api.stats.endpoints import boxscoreadvancedv2

# vector of game ids (test purposes)
gameids = ['0021900001','0021900002','0021900012']

headers1 = {
    'Host': 'stats.nba.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',
    'Accept': 'application/json, text/plain, */*',
    'Accept-Language': 'en-US,en;q=0.5',
    'Referer': 'https://stats.nba.com/',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive',
}

# store player and team results for each gameids as elements of list temp
temp = list()
for i in range(len(gameids)):
    temp.append(boxscoreadvancedv2.BoxScoreAdvancedV2(game_id = gameids[i], headers=headers1))

# manually access elements of list and output to data frame
## there has to be an easier way to access list elements and rowbind the results!!!
df_out0 = temp[0].get_data_frames()
df_player0 = df_out0[0]
df_team0 = df_out0[1]

df_out1 = temp[1].get_data_frames()
df_player1 = df_out1[0]
df_team1 = df_out1[1]

can you please provide a set of the data? – srty Nov 10 '19 at 03:49 — srty, Nov 10 '19 at 03:49

score 1 · Answer 1 · answered Nov 10 '19 at 23:13

After a bit more reading (and clarity) I was able to combine the manual parts of my code in for loops that generate one list with player data and one list with team data. Then, using this post: Concatenate a list of pandas dataframes together I was able to combine the player and team lists into respective data frames.

## output player frames
i=0
df_out=[]
df_players=[]
for i in range(len(temp)):
    df_out = temp[i].get_data_frames()
    df_players.append(df_out[0])         # index 0 will always contain player frame

df_players = pd.concat(df_players)
print(df_players)

## output team frames
i=0
df_out=[]
df_team=[]
for i in range(len(temp)):
    df_out = temp[i].get_data_frames()
    df_team.append(df_out[1])            # index 1 will always contain team frame

df_team = pd.concat(df_team)
print(df_team)

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

First of all, congratulations on sticking it through and finding a solution on your own! :D

Comments and tips

You can iterate over a list directly, no indexes needed

lst_1 = [1, 2, 3, 4]

for i in range(len(lst_1)):
    print(i)

can be written as

lst_1 = [1, 2, 3, 4]

for item in lst_1:
    print(item)

List comprehensions and generator expressions are awesome

Bonus: Notice the changes I make to variable names. See PEP 8 for a general reference on Python style.

gameids = ['0021900001','0021900002','0021900012']

headers1 = {
    'Host': 'stats.nba.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',
    'Accept': 'application/json, text/plain, */*',
    'Accept-Language': 'en-US,en;q=0.5',
    'Referer': 'https://stats.nba.com/',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive',
}

# store player and team results for each gameids as elements of list temp
temp = list()
for i in range(len(gameids)):
    temp.append(boxscoreadvancedv2.BoxScoreAdvancedV2(game_id = gameids[i], headers=headers1))

can be written as

game_ids = ['0021900001','0021900002','0021900012']

api_headers = {
    'Host': 'stats.nba.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',
    'Accept': 'application/json, text/plain, */*',
    'Accept-Language': 'en-US,en;q=0.5',
    'Referer': 'https://stats.nba.com/',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive',
}

api_results = [boxscoreadvancedv2.BoxScoreAdvancedV2(game_id=curr_game_id, headers=api_headers) for curr_game_id in game_ids]

You're iterating over the same thing twice

# output player frames
i=0
df_out=[]
df_players=[]
for i in range(len(temp)):
    df_out = temp[i].get_data_frames()
    df_players.append(df_out[0])         # index 0 will always contain player frame

df_players = pd.concat(df_players)
print(df_players)

# output team frames
i=0
df_out=[]
df_team=[]
for i in range(len(temp)):
    df_out = temp[i].get_data_frames()
    df_team.append(df_out[1])            # index 1 will always contain team frame

df_team = pd.concat(df_team)
print(df_team)

Using the first two tips, here is what we end up with:

players_lst = []
team_lst = []

for curr_res in api_results:
    curr_dfs = curr_res.get_data_frames()
    players_lst.append(curr_dfs[0])
    team_lst.append(curr_dfs[1])

players_df = pd.concat(players_lst)
team_df = pd.concat(team_lst)

My solution

Here it is, broken down slightly for the sake of clarity.

import pandas as pd
from nba_api.stats.endpoints.boxscoreadvancedv2 import BoxScoreAdvancedV2

game_ids = ['0021900001', '0021900002', '0021900012']

api_headers = {
    'Host': 'stats.nba.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',
    'Accept': 'application/json, text/plain, */*',
    'Accept-Language': 'en-US,en;q=0.5',
    'Referer': 'https://stats.nba.com/',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive',
}

# generator of results from the API
api_results = (BoxScoreAdvancedV2(game_id=curr_game_id, headers=api_headers) for curr_game_id in game_ids)

# generator of lists of DataFrames from the API results
# think of it like: [[Player DF, Team DF], [Player DF, Team DF], ...]
api_res_dfs = (curr_res.get_data_frames() for curr_res in api_results)

# unpacking the size 2 lists of DataFrames into 2 flat lists
# [[Player DF, Team DF], [Player DF, Team DF], ...] -> [Player DF, Player DF, ...], [Team DF, Team DF, ...]
# see https://stackoverflow.com/q/2921847/11301900 for more on the use of the asterisk (*)
players_tupe, team_tupe = zip(*api_res_dfs)

# concatenating the various DataFrames, exactly the same as in your original code
players_df = pd.concat(players_tupe)
team_df = pd.concat(team_tupe)

print(players_df)
print(team_df)

It hinges on the fact that not only, as you pointed out, the player DataFrame is always first in the list and the team DataFrame is always second, but that those are the only two items in the list of results.

Let me know if you have any questions :)

This is a really good, quality answer! I really appreciate the feedback about my code as this is the first Python code I've written. I'm accepting your answer as it's a much better response. Thank you! — On_an_island, Nov 12 '19 at 00:17
For your first Python program, that is quite strong! Surely you have experience in other languages, no? Also, do you have any questions? I may not have explained some things well enough. — AMC, Nov 12 '19 at 00:59
My background is in R so I definitely had that experience to lean back on. I don't have any questions about your answer. It worked really well for what I was trying to accomplish. I will definitely have more questions as I continue to learn Python so I'm sure you'll see some questions from me pop up on SO every now and then. Thanks again! — On_an_island, Nov 14 '19 at 14:35