1

I am trying to create an NHL betting system and need to web scrape live data every day. The table i need to web scrape is not available using the IMPORTHTML() function. I am trying to use python but have found no good tutorials for a beginner. I need help

>>> from bs4 import BeautifulSoup
>>> import requests
>>> from selenium import webdriver
>>> import pandas as ps
>>> PATH = "C:/webdrivers/chromedriver.exe"
>>> table_name = "table_container"
>>> csv_name = 'nhl_season_stats.csv'
>>> URL = "https://www.hockey-reference.com/leagues/NHL_2021.html"
>>> def get_nhl_stats(URL):
...     driver = webdriver.Chrome(PATH)
...     driver.get(URL)
...     soup = BeautifulSoup(driver.page_source,'html')
...     driver.quit()
...     tables = soup.find_all('table',{"id":[table_name]})
...     for table in tables:
...             tab_name = table['id']
...             tab_data = [[cell.text for cell in row.find_all(["th","td"])]
...                                     for row in table.find_all("tr")]
...             df = pd.DataFrame(tab_data)
...             df.columns = df.iloc[0,:]
...             df.drop(index=0,inplace= True)
...             df.to_csv(csv_name, index = False)
...             print(tab_name)
...             print(df)
...
>>> get_nhl_stats(URL)

I keep getting this error:

DevTools listening on ws://127.0.0.1:59353/devtools/browser/2ad39b85-94a0- 
4f64-a738-994c69f7373c
[10572:2256:0123/020420.281:ERROR:device_event_log_impl.cc(211)] 
[02:04:20.281] USB: usb_device_handle_win.cc:1049 Failed to read descriptor 
from node connection: A device attached to the system is not functioning. 
(0x1F)
[10572:2256:0123/020420.283:ERROR:device_event_log_impl.cc(211)] 
[02:04:20.283] USB: usb_device_handle_win.cc:1049 Failed to read descriptor 
from node connection: A device attached to the system is not functioning.    
(0x1F)
Mason
  • 13
  • 6
  • 1
    Please provide the code you have tried already. – goalie1998 Jan 23 '21 at 06:26
  • @goalie1998 Okay i did – Mason Jan 23 '21 at 08:03
  • @Mason just curious, but why use Selenium? You could get that data simply using 1) `requests` with `beautifulsoup`; or 2) `pandas`, or 3) use the api at nhl.com. All these option will be faster than having to simulate opening a browser, then parsing the data. – chitown88 Jan 23 '21 at 10:55
  • @goalie1998 I got the script off of some guy on YouTube, i honestly have no idea what im doing, i was just trying to copy him – Mason Jan 23 '21 at 10:59

2 Answers2

1

What happens in your code?

You try to grab all tables with the id table_container, what would not work, cause there are only tables with a class that is called table_container

How to fix?

It is not clear in your question what table you like to grab, but I think it is stats So change the value of your variable before the loop:

table_name = "stats"

Concerning your error

Take a look at this answer: Failed to read descriptor from node connection: A device attached to the system is not functioning error using ChromeDriver Chrome through Selenium

Example

from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as ps
PATH = "C:/webdrivers/chromedriver.exe"
table_name = "stats"
csv_name = 'nhl_season_stats.csv'
URL = "https://www.hockey-reference.com/leagues/NHL_2021.html"
def get_nhl_stats(URL):
    driver = webdriver.Chrome(PATH)
    driver.get(URL)
    soup = BeautifulSoup(driver.page_source,'html')
    driver.quit()
    tables = soup.find_all('table',{"id":[table_name]})
    
    for table in tables:
            tab_name = table['id']
            tab_data = [[cell.text for cell in row.find_all(["th","td"])]
                                    for row in table.find_all("tr")]
            df = pd.DataFrame(tab_data)
            df.columns = df.iloc[0,:]
            df.drop(index=0,inplace= True)
            df.to_csv(csv_name, index = False)
            print(tab_name)
            print(df)

get_nhl_stats(URL)

Output

0                                                       Special Teams  \
1   Rk                         AvAge  GP  W  L  OL  PTS          PTS%   
2    1     Montreal Canadiens   28.6   5  3  0   2    8          .800   
3    2   Vegas Golden Knights   29.0   4  4  0   0    8         1.000   
4    3    Philadelphia Flyers   27.0   5  3  1   1    7          .700   
5    4          Winnipeg Jets   27.9   4  3  1   0    6          .750   
6    5     New York Islanders   28.9   4  3  1   0    6          .750   
7    6    Toronto Maple Leafs   29.0   5  3  2   0    6          .600   
8    7    Tampa Bay Lightning   27.7   3  3  0   0    6         1.000   
HedgeHog
  • 22,146
  • 4
  • 14
  • 36
1

I'm not sure the sports-reference sites are "live", but they are current. You could let pandas do most of the work for you to parse the tables. I'm suspecting you are using Selenium because those tables don't show in the html using simple requests. But the tables are actually there within the comments of the html. Just need to pull those out:

import requests
from bs4 import BeautifulSoup, Comment
import pandas as pd

URL = 'https://www.hockey-reference.com/leagues/NHL_2021.html'
def get_nhl_stats(URL):
    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}

    pageTree = requests.get(URL, headers=headers)
    pageSoup = BeautifulSoup(pageTree.content, 'html.parser')
    comments = pageSoup.find_all(string=lambda text: isinstance(text, Comment))
    
    tables = []
    for each in comments:
        if 'table' in str(each):
            try:
                tables.append(pd.read_html(str(each), header=1)[0])
            except:
                continue
    
    df = tables[0]
    df = df.rename(columns={'Unnamed: 1':'Team'})
    print(df)

get_nhl_stats(URL)

Output:

print(df.to_string())
      Rk                   Team  AvAge  GP  W  L  OL  PTS   PTS%  GF  GA  SOW  SOL   SRS   SOS  TG/G  EVGF  EVGA  PP  PPO    PP%  PPA  PPOA     PK%  SH  SHA  PIM/G  oPIM/G    S    S%   SA    SV%  SO
0    1.0    Toronto Maple Leafs   29.0   6  4  2   0    8  0.667  19  17  0.0  0.0  0.33 -0.01  6.00    11    12   8   18  44.44    4    22   81.82   0    1   10.5     7.5  190  10.0  157  0.892   0
1    2.0     Montreal Canadiens   28.6   5  3  0   2    8  0.800  24  15  0.0  1.0  0.77 -0.83  7.80    14     8   6   20  30.00    6    25   76.00   4    1   11.4    10.6  180  13.3  140  0.893   0
2    3.0   Vegas Golden Knights   28.9   5  4  1   0    8  0.800  18  12  0.0  0.0  1.12 -0.08  6.00    15     8   2   18  11.11    3    18   83.33   1    1    7.2     7.2  150  12.0  125  0.904   0
3    4.0         Minnesota Wild   29.1   5  4  1   0    8  0.800  15  10  0.0  0.0  0.86 -0.14  5.00    13     9   1   23   4.35    1    16   93.75   1    0    7.6    10.4  166   9.0  147  0.932   0
4    5.0    Washington Capitals   30.1   5  3  0   2    8  0.800  18  16  1.0  1.0  0.10 -0.30  6.80    16    12   2    9  22.22    3    18   83.33   0    1    8.6     5.0  130  13.8  141  0.887   0
5    6.0    Philadelphia Flyers   27.0   5  3  1   1    7  0.700  19  15  0.0  1.0  0.36 -0.24  6.80    14    10   5   17  29.41    5    18   72.22   0    0    7.2     6.8  125  15.2  187  0.920   1
6    7.0     Colorado Avalanche   26.9   5  3  2   0    6  0.600  17  12  0.0  0.0  0.47 -0.53  5.80     7     9  10   25  40.00    3    19   84.21   0    0    8.0    10.4  147  11.6  143  0.916   1
7    8.0          Winnipeg Jets   27.9   4  3  1   0    6  0.750  13  10  0.0  0.0  1.10  0.35  5.75    11     6   2   20  10.00    4    12   66.67   0    0   10.3    14.3  119  10.9  134  0.925   0
8    9.0     New York Islanders   28.9   4  3  1   0    6  0.750   9   6  0.0  0.0  0.61 -0.14  3.75     5     5   4   20  20.00    1    15   93.33   0    0   11.5    11.0  108   8.3  114  0.947   2
9   10.0    Tampa Bay Lightning   27.7   3  3  0   0    6  1.000  13   5  0.0  0.0  1.70 -0.97  6.00    11     2   2    8  25.00    3    11   72.73   0    0    9.0     7.0  107  12.1   85  0.941   0
10  11.0    Pittsburgh Penguins   28.6   5  3  2   0    6  0.600  16  21  2.0  0.0 -0.43  0.17  7.40    10    16   5   18  27.78    5    19   73.68   1    0    7.6     7.2  152  10.5  130  0.838   0
11  12.0      New Jersey Devils   26.2   4  2  1   1    5  0.625   9  10  0.0  1.0 -0.35  0.15  4.75     8     3   1   11   9.09    6    16   62.50   0    1    9.8     7.3  112   8.0  150  0.933   0
12  13.0        St. Louis Blues   28.3   4  2  1   1    5  0.625  10  14  0.0  1.0 -1.66 -0.41  6.00    10     6   0   14   0.00    8    21   61.90   0    0   11.0     7.5  109   9.2  129  0.891   0
13  14.0          Boston Bruins   28.8   4  2  1   1    5  0.625   7   9  2.0  0.0  0.07  0.07  4.00     3     7   3   13  23.08    2    18   88.89   1    0   11.3     8.8  135   5.2   96  0.906   0
14  15.0        Arizona Coyotes   28.4   5  2  2   1    5  0.500  17  17  0.0  1.0 -0.04  0.16  6.80    11    11   5   22  22.73    5    24   79.17   1    1   10.4     9.6  144  11.8  157  0.892   0
15  16.0         Calgary Flames   28.1   3  2  0   1    5  0.833  11   6  0.0  0.0  1.14 -0.52  5.67     5     4   6   16  37.50    1    12   91.67   0    1    8.7    11.3   93  11.8   93  0.935   1
16  17.0        Edmonton Oilers   27.9   6  2  4   0    4  0.333  15  20  0.0  0.0 -0.91 -0.08  5.83    10    14   3   23  13.04    4    18   77.78   2    2    7.7     9.3  192   7.8  200  0.900   0
17  18.0      Vancouver Canucks   27.3   6  2  4   0    4  0.333  17  28  1.0  0.0 -1.34  0.33  7.50    12    17   4   26  15.38    9    31   70.97   1    2   13.3    10.7  179   9.5  222  0.874   0
18  19.0          Anaheim Ducks   28.6   5  1  2   2    4  0.400   8  13  0.0  0.0 -0.10  0.90  4.20     8    10   0   12   0.00    2    15   86.67   0    1    6.4     5.2  133   6.0  160  0.919   1
19  20.0  Columbus Blue Jackets   26.6   5  1  2   2    4  0.400  10  16  0.0  0.0 -1.19  0.01  5.20     9    15   1   11   9.09    1    10   90.00   0    0    9.0     9.4  152   6.6  169  0.905   0
20  21.0      Los Angeles Kings   28.3   4  1  1   2    4  0.500  12  13  0.0  0.0  0.43  0.68  6.25     8    10   4   17  23.53    3    21   85.71   0    0   11.0     9.0  119  10.1  121  0.893   0
21  22.0      Detroit Red Wings   29.3   5  2  3   0    4  0.400  10  14  0.0  0.0 -1.54 -0.74  4.80     9     9   1   12   8.33    4    16   75.00   0    1   11.4     9.8  130   7.7  155  0.910   0
22  23.0        San Jose Sharks   29.4   5  2  3   0    4  0.400  12  18  2.0  0.0 -1.32 -0.52  6.00     7    16   5   21  23.81    2    18   88.89   0    0    8.4     9.6  162   7.4  148  0.878   0
23  24.0    Carolina Hurricanes   27.0   3  2  1   0    4  0.667   9   6  0.0  0.0  0.26 -0.74  5.00     6     5   3   12  25.00    1     9   88.89   0    0    7.7     9.7   98   9.2   68  0.912   1
24  25.0       Florida Panthers   27.8   2  2  0   0    4  1.000  10   6  0.0  0.0  1.29 -0.71  8.00     7     3   3    8  37.50    3     5   40.00   0    0    5.0     8.0   66  15.2   66  0.909   0
25  26.0    Nashville Predators   28.7   4  2  2   0    4  0.500  10  14  0.0  0.0  0.01  1.01  6.00     9     7   1   16   6.25    6    16   62.50   0    1    8.0     8.0  135   7.4  126  0.889   0
26  27.0         Buffalo Sabres   27.2   5  1  3   1    3  0.300  14  15  0.0  1.0 -0.18  0.22  5.80    11    14   3   17  17.65    1     6   83.33   0    0    3.8     8.2  161   8.7  133  0.887   0
27  28.0       New York Rangers   25.6   4  1  2   1    3  0.375  11  11  0.0  1.0 -0.15  0.11  5.50     7     7   4   21  19.05    4    16   75.00   0    0    8.5    14.0  140   7.9  112  0.902   1
28  29.0     Chicago Blackhawks   26.9   5  1  3   1    3  0.300  13  21  0.0  0.0 -0.43  1.17  6.80     5    16   7   17  41.18    5    20   75.00   1    0    8.0     6.8  154   8.4  167  0.874   0
29  30.0        Ottawa Senators   27.0   4  1  2   1    3  0.375  11  14  0.0  0.0 -0.04  0.71  6.25     8    10   3   18  16.67    4    21   80.95   0    0   14.3    15.3  113   9.7  120  0.883   0
30  31.0           Dallas Stars   28.8   1  1  0   0    2  1.000   7   0  0.0  0.0  7.30  0.30  7.00     1     0   5    8  62.50    0     5  100.00   1    0   10.0    16.0   28  25.0   34  1.000   1
31   NaN         League Average   28.0   4  2  2   1    5  0.574  13  13  NaN  NaN   NaN   NaN  5.94     9     9   4   16  21.33    4    16   78.67   0    0    8.0     8.0  133   9.8  133  0.902   0
chitown88
  • 27,527
  • 4
  • 30
  • 59
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/227823/discussion-on-answer-by-chitown88-how-to-web-scrape-live-data-into-google-sheets). – Machavity Jan 26 '21 at 00:51
  • @Mason, after looking into it, I think it's beautifulsoup version. It appears now when pulling out comments, they are as beautfulsoup objects, rather than strings. So just need to fix 2 lines in that code. The code is updated above. – chitown88 Jan 27 '21 at 11:31