0

I'm trying to get the table from this link: https://www.nba.com/standings?GroupBy=conf&Season=2019-20&Section=overall

url = 'https://www.nba.com/standings?GroupBy=conf&Season=2019-20&Section=overall'
page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')
soup.find_all('table')

However, I get an empty list returned. I looked in the html for the website,a nd I can see tha table tags. What am I missing to pull these tables?

Ethan
  • 534
  • 5
  • 21
  • This may help: https://stackoverflow.com/questions/2935658/beautifulsoup-get-the-contents-of-a-specific-table – mehrdadep Dec 15 '20 at 05:15
  • It looks like the table is getting loaded through JavaScript, so fetching from the URL unfortunately isn't going to include it. You can check by printing out the content that `requests` loads. – Alexander Cai Dec 15 '20 at 05:20

1 Answers1

1

Need selenium to extract the table data because data load through JavaScript. as an example i here extract the table one data and save to csv file.

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

url = 'https://www.nba.com/standings?GroupBy=conf&Season=2019-20&Section=overall'
driver = webdriver.Chrome(r"C:\Users\Subrata\Downloads\chromedriver.exe")
driver.get(url)

soup = BeautifulSoup(driver.page_source, 'html.parser')
tables = soup.select('div.StandingsGridRender_standingsContainer__2EwPy')
table1 = []
for td in tables[0].find_all('tr'):
    first =[t.getText(strip=True, separator=' ') for t in td]
    table1.append(first)


df = pd.DataFrame(table1[1:], columns=table1[0] )

df.to_csv('x.csv')
Samsul Islam
  • 2,581
  • 2
  • 17
  • 23
  • This works on my local machine, but when I deploy my code to Heroku, bs4 can't find the tables and I get an IndexError when trying to loop through the tables. Do you know why this could be? – Ethan Dec 22 '20 at 01:48