0

I have a python script that grabs NBA game data from each day of the season. My script has a for loop which starts at day 1 and goes to current day. For each day, the script:

  • Opens webpage with schedule for that day and gets all game info using Beautiful Soup
  • stores it in a dataframe
  • appends that data to a BigQuery table

The dataframe is never more than 15 rows and usually even smaller, yet the script will just randomly hang while trying to insert into BigQuery table using to_gbq function. It might happen on the first day, the 3rd day, or some other. But it hasn't even come close to running for every day before hanging.

This is the code that it gets stuck on:

game_df.to_gbq(
            "nba_data.games",
            if_exists="append",
            project_id="my_test_project",
            credentials=credentials
)

Any idea why this would happen?

  • share the code that grabs the schedule for the day. It may be hanging up on that portion. I'm also feeling like there's amore efficient way than you are doing. Having it open a page and parse (sounds like Selenium) would be a less than ideal way to go considering there are a few apis out there to get that same data. – chitown88 Mar 17 '22 at 09:27
  • for example, this response get the [entire season](https://cdn.nba.com/static/json/staticData/scheduleLeagueV2_1.json) – chitown88 Mar 17 '22 at 09:47
  • @chitown88 Thanks for the response! So that was a high level summary of the script. I'm actually going into each game and gathering data on the opening tip (who jumped, who won the tip) and first basket scored for each team (plus all attempts until both teams have scored). – obi_wan_jabroni Mar 17 '22 at 14:49
  • @chitown88 Thats why i'm looping through days. But, I've confirmed the code successfully completes that piece each time. It only hangs once it gets to the code which attempts to load the dataframe contents to a BigQuery table via the pandas to_gbq function. – obi_wan_jabroni Mar 17 '22 at 14:51
  • ah ok. interesting. Ya seems like others have had issues with performance with that too. have you read [this one](https://stackoverflow.com/questions/48886761/efficiently-write-a-pandas-dataframe-to-google-bigquery)? – chitown88 Mar 17 '22 at 15:01
  • @chitown88 hmm looks like there may be a faster (and hopefully more reliable) approach based on some of the comments on that post. This is a personal project so I will try to implement after work today. – obi_wan_jabroni Mar 17 '22 at 15:26
  • If it works, be sure to post what you did as a solution to this post. It's ok to post a solution on your own question. It could help people in the future that come across this post. – chitown88 Mar 17 '22 at 15:43

0 Answers0