5

The following is my code:

import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup

stats_page = requests.get('https://www.sports-reference.com/cbb/schools/loyola-il/2020.html')
content = stats_page.content
soup = BeautifulSoup(content, 'html.parser')
table = soup.find(name='table', attrs={'id':'per_poss'})

html_str = str(table)
df = pd.read_html(html_str)[0]
df.head()

And I get the error: ValueError: No tables found.

However, when I swap attrs={'id':'per_poss'} with a different table id like attrs={'id':'per_game'} I get an output.

I am not familiar with html and scraping but I noticed in the tables that work, this is the html: <table class="sortable stats_table now_sortable is_sorted" id="per_game" data-cols-to-freeze="2">

And in the tables that don't work, this is the html: <table class="sortable stats_table now_sortable sticky_table re2 le1" id="totals" data-cols-to-freeze="2">

It seems the table classes are different and I am not sure if that is causing this problem and how to fix it if so.

Thank you!

Ben Burner
  • 53
  • 3

1 Answers1

5

This is happening because the table is within HTML comments <!-- .... -->.

You can extract the table checking if the tags are of the type Comment:

import pandas as pd
import requests
from bs4 import BeautifulSoup, Comment

URL = "https://www.sports-reference.com/cbb/schools/loyola-il/2020.html"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")

comments = soup.find_all(text=lambda t: isinstance(t, Comment))
comment_soup = BeautifulSoup(str(comments), "html.parser")

table = comment_soup.select("#div_per_poss")[0]
df = pd.read_html(str(comment_soup))
print(df)

Output:

[      Rk             Player   G    GS    MP   FG  ...  AST  STL  BLK  TOV   PF   PTS
0    1.0    Cameron Krutwig  32  32.0  1001  201  ...  133   39   20   81   45   482
1    2.0          Tate Hall  32  32.0  1052  141  ...   70   47    3   57   56   406
2    3.0   Marquise Kennedy  32   6.0   671  110  ...   43   38    9   37   72   294
3    4.0   Lucas Williamson  32  32.0   967   99  ...   53   49    9   57   64   287
4    5.0      Keith Clemons  24  24.0   758   78  ...   47   29    1   32   50   249
5    6.0         Aher Uguak  32  31.0   768   62  ...   61   15    3   59   56   181
6    7.0      Jalon Pipkins  30   1.0   392   34  ...   12   10    1   17   15   101
7    8.0      Paxson Wojcik  30   1.0   327   25  ...   18   14    0   14   23    61
...
...
MendelG
  • 14,885
  • 4
  • 25
  • 52
  • This is quite bizarre. Do you know why the table still displays on the page even tho it it surrounded by a comment? – robertwest Nov 13 '20 at 20:49
  • @Robert I'm not a webdeveloper, so I can't tell you – MendelG Nov 13 '20 at 20:50
  • @MendelG I am attempting to use the same code to scrape another table on the page ("div_advanced"), but when I change the code to this:(comment_soup.select("#div_advanced")), my output remains the same as shown above, and doesn't grab the right table. Do you have any idea why? – Ben Burner Nov 14 '20 at 14:40
  • 1
    @BenBurner I do get different output. Maybe see [Pretty-print an entire Pandas Series / DataFrame](https://stackoverflow.com/questions/19124601/pretty-print-an-entire-pandas-series-dataframe) – MendelG Nov 15 '20 at 04:20