2

This page I want to parse - https://fbref.com/en/comps/9/gca/Premier-League-Stats

It has 2 tables, I am trying to get information from the second table, but it keeps displaying the first table every time I run this code.

from bs4 import BeautifulSoup
import requests

source = requests.get('https://fbref.com/en/comps/9/gca/Premier-League-Stats').text
soup = BeautifulSoup(source, 'lxml')
stattable = soup.find('table', class_= 'min_width sortable stats_table min_width shade_zero')[1]

print(stattable)

min_width sortable stats_table min_width shade_zero is the ID of the 'second' table.

It does not give me an error nor does it return anything. It's null.

Mr. Polywhirl
  • 42,981
  • 12
  • 84
  • 132

2 Answers2

1

The HTML you see when you do inspect element are generated using Javascript. However, the same classes are not available in the raw html that you get using the script. I disabled Javascript for this site and I saw that the table is not visible.
You can try something like Selenium. There is good information in this question.

Meysam
  • 409
  • 4
  • 14
1

Since the second table is dynamically generated, why not combine selenium, BeautifulSoup, and pandas to get what you want?

For example:

import time

import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = False
driver = webdriver.Chrome(options=options)

driver.get("https://fbref.com/en/comps/9/gca/Premier-League-Stats")
time.sleep(2)

soup = BeautifulSoup(driver.page_source, "html.parser").find("div", {"id": "div_stats_gca"})
driver.close()

df = pd.read_html(str(soup), skiprows=[0, 1])
df = pd.concat(df)
df.to_csv("data.csv", index=False)

This spits out a .csv file that, well, looks like that table you want. :)

enter image description here

baduker
  • 19,152
  • 9
  • 33
  • 56