First of all, congratulations on sticking it through and finding a solution on your own! :D
Comments and tips
You can iterate over a list directly, no indexes needed
lst_1 = [1, 2, 3, 4]
for i in range(len(lst_1)):
print(i)
can be written as
lst_1 = [1, 2, 3, 4]
for item in lst_1:
print(item)
Bonus: Notice the changes I make to variable names. See PEP 8 for a general reference on Python style.
gameids = ['0021900001','0021900002','0021900012']
headers1 = {
'Host': 'stats.nba.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'en-US,en;q=0.5',
'Referer': 'https://stats.nba.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
}
# store player and team results for each gameids as elements of list temp
temp = list()
for i in range(len(gameids)):
temp.append(boxscoreadvancedv2.BoxScoreAdvancedV2(game_id = gameids[i], headers=headers1))
can be written as
game_ids = ['0021900001','0021900002','0021900012']
api_headers = {
'Host': 'stats.nba.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'en-US,en;q=0.5',
'Referer': 'https://stats.nba.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
}
api_results = [boxscoreadvancedv2.BoxScoreAdvancedV2(game_id=curr_game_id, headers=api_headers) for curr_game_id in game_ids]
You're iterating over the same thing twice
# output player frames
i=0
df_out=[]
df_players=[]
for i in range(len(temp)):
df_out = temp[i].get_data_frames()
df_players.append(df_out[0]) # index 0 will always contain player frame
df_players = pd.concat(df_players)
print(df_players)
# output team frames
i=0
df_out=[]
df_team=[]
for i in range(len(temp)):
df_out = temp[i].get_data_frames()
df_team.append(df_out[1]) # index 1 will always contain team frame
df_team = pd.concat(df_team)
print(df_team)
Using the first two tips, here is what we end up with:
players_lst = []
team_lst = []
for curr_res in api_results:
curr_dfs = curr_res.get_data_frames()
players_lst.append(curr_dfs[0])
team_lst.append(curr_dfs[1])
players_df = pd.concat(players_lst)
team_df = pd.concat(team_lst)
My solution
Here it is, broken down slightly for the sake of clarity.
import pandas as pd
from nba_api.stats.endpoints.boxscoreadvancedv2 import BoxScoreAdvancedV2
game_ids = ['0021900001', '0021900002', '0021900012']
api_headers = {
'Host': 'stats.nba.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0',
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'en-US,en;q=0.5',
'Referer': 'https://stats.nba.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
}
# generator of results from the API
api_results = (BoxScoreAdvancedV2(game_id=curr_game_id, headers=api_headers) for curr_game_id in game_ids)
# generator of lists of DataFrames from the API results
# think of it like: [[Player DF, Team DF], [Player DF, Team DF], ...]
api_res_dfs = (curr_res.get_data_frames() for curr_res in api_results)
# unpacking the size 2 lists of DataFrames into 2 flat lists
# [[Player DF, Team DF], [Player DF, Team DF], ...] -> [Player DF, Player DF, ...], [Team DF, Team DF, ...]
# see https://stackoverflow.com/q/2921847/11301900 for more on the use of the asterisk (*)
players_tupe, team_tupe = zip(*api_res_dfs)
# concatenating the various DataFrames, exactly the same as in your original code
players_df = pd.concat(players_tupe)
team_df = pd.concat(team_tupe)
print(players_df)
print(team_df)
It hinges on the fact that not only, as you pointed out, the player DataFrame is always first in the list and the team DataFrame is always second, but that those are the only two items in the list of results.
Let me know if you have any questions :)