0

I am hoping someone can help me out. I have been trying for weeks and cannot figure out how to scrape archived data from www.pregame.com/game-center

For instance, I would like to scrape dates from the entire season of NBA. An example would be date 2/8/2022 - here is the url: https://pregame.com/game-center/?d=1644300000000&t=2&l=3&a=1&s=StartTimeDate&m=false&b=undefined&o=Current&c=All&k=

Please if anyone could give some advice/guidance. Thank you!

  • Have you tried Selenium? Can you provide any code you have written to try this? – Joe Feb 17 '22 at 02:41
  • 1
    Hey Joe - appreciate the response. I am pretty novice at python coding and have tried different ideas I have found via this site/google/youtube/etc. I've tried Selenium with no luck. Hoping someone can point me in the right direction with a code/script. – mikeysins Feb 17 '22 at 02:51
  • Does this help you? https://stackoverflow.com/questions/36705083/download-data-using-selenium If you are unsure about how to get selenium up and running, you can use ChromeDriverManager to take care of downloading the correct 'browser' for you. https://pypi.org/project/webdriver-manager/ – Joe Feb 17 '22 at 02:56
  • I have selenium up and running and have played around with it for the last week or two. however, I cannot seem to find a way to scrape the data I am looking for. Any thoughts on scraping the info in the table in the link I provided? In short, I want to scrape the table and write the contents to excel. – mikeysins Feb 17 '22 at 20:15

1 Answers1

0

I don't see any NBA games coming up for Feb 8.

Anyway, you can get the data through the api and enter the date into the url (or as a payload parameter). There's some data merging and cleanup you'll have to do, but just for a quick example:

import requests
import pandas as pd

url = 'https://pregame.com/api/gamecenter/init?dt=1-30-2022'
headers = {'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36'}
jsonData = requests.get(url).json()

tables = ['Consensus','Events','Odds','Scores']

for idx, k in enumerate(tables):
    table = jsonData['GameCenterData'][k]
    temp = pd.json_normalize(table)
    
    if idx == 0:
        results = temp
    else:
        if 'EventId' not in temp.columns:
            temp = temp.rename(columns={'Id':'EventId'})
        results = results.merge(temp, how='outer', on=['EventId'])
        
resultsNBA = results[results['LeagueName'] == 'NBA']
resultsNBA.to_csv('nba_2022_01_30.csv', index=False)

Output:

print(resultsNBA)
        AllCash  AllCashRanking  ...  AwayStatus  HomeStatus
522   213380.23             0.0  ...                   Final
523   213380.23             0.0  ...                   Final
524   213380.23             0.0  ...                   Final
525   213380.23             0.0  ...                   Final
526   213380.23             0.0  ...                   Final
        ...             ...  ...         ...         ...
9943   88802.61             0.0  ...                   Final
9944   88802.61             0.0  ...                   Final
9945   88802.61             0.0  ...                   Final
9946   88802.61             0.0  ...                   Final
9947   88802.61             0.0  ...                   Final

[2898 rows x 108 columns]
chitown88
  • 27,527
  • 4
  • 30
  • 59