So this is what I currently have. This code makes about 5,000 calls to the NBA API and returns the total Games Played and Points Scored of every NBA player who has ever played in the playoffs. The players (names as keys, stats as values) are all added to the 'stats_dict' dictionary.
MY QUESTION IS THIS: does anybody know how I could significantly increase the speed of this process by using threading? Right now, it takes about 30 minutes to make all these API calls, which of course I would love to significantly improve upon. I've never used threads before and would appreciate any guidance.
Thanks
import pandas as pd
from nba_api.stats.endpoints import commonallplayers
from nba_api.stats.endpoints import playercareerstats
import numpy as np
player_data = commonallplayers.CommonAllPlayers(timeout = 30)
player_df = player_data.common_all_players.get_data_frame().set_index('PERSON_ID')
id_list = player_df.index.tolist()
def playoff_stats(person_id):
player_stats = playercareerstats.PlayerCareerStats(person_id, timeout = 30)
yield player_stats.career_totals_post_season.get_data_frame()[['GP', 'PTS']].values.tolist()
stats_dict = {}
def run_it():
for i in id_list:
try:
stats_call = next(playoff_stats(i))
if len(stats_call) > 0:
stats_dict[player_df.loc[i]['DISPLAY_FIRST_LAST']] = [stats_call[0][0], stats_call[0][1]]
except KeyError:
continue