0

I am attempting to scrape the website basketball-reference and am running into an issue I can't seem to solve. I am trying to grab the box score element for each game played. This is something I was able to easily do with urlopen but b/c other portions of the site require Selenium I thought I would rewrite the entire process with Selenium

Issue seems to be that even if I wait to scrape until I to see the first element load using WebDriverWait, when I then move forward to grabbing the elements I get nothing returned.

One thing I found interesting is if I did a full site print using my results from urlopen w/ something like print (uClient.read()) I would get roughly 300 more lines of html after beautifying compared to doing the same with print (driver.page_source). Even if I put an ImplicitlyWait set for 5 minutes.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC



driver = webdriver.Chrome('/usr/local/bin/chromedriver')
driver.wait = WebDriverWait(driver, 10)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH,'//*[@id="content"]/div[3]/div[1]')))


box = driver.find_elements_by_class_name('game_summary expanded nohover')

print (box)

driver.quit()
zdaman101
  • 3
  • 1
  • Increase the waiting time from 10 to 30 and see if it helps. – Swaroop Humane Apr 28 '21 at 22:11
  • I upped to both 30 and 60 and in both cases returned no results still. – zdaman101 Apr 28 '21 at 22:32
  • find_elements will return if at least one item is found... add a sleep before using it... (or catch Stale Element exceptions by triggering a method on the webelement(s)... if caught, re-do the find_elements call) – pcalkins Apr 28 '21 at 23:11

2 Answers2

0

Try the below code, it is working in my computer. Do let me know if you still face problem.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.wait = WebDriverWait(driver, 60)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="content"]/div[3]/div[1]')))

boxes = driver.wait.until(
    EC.presence_of_all_elements_located((By.XPATH, "//div[@class=\"game_summary expanded nohover\"]")))

print("Number of Elements Located : ", len(boxes))

for box in boxes:
    print(box.text)
    print("-----------")

driver.quit()

If it resolves your problem then please mark it as answer. Thanks

Swaroop Humane
  • 1,770
  • 1
  • 7
  • 17
  • This indeed worked. I didn't realize there was a .presence_of_all_elements_located class. One oddity I did note was the fact that using Selenium is producing double the amount of div class='game_summary expanded nohover'. However once printed, it shows the first six as blank and then the actual six that I see while inspecting the html. Not a huge deal, but curious if you might know why that is occurring. It didn't when I used to perform this action in uClient.read()....that was the only way I noticed as I compared the results of both. – zdaman101 Apr 28 '21 at 23:23
  • Yes, i am also getting 12 elements. to get the exactly what you want you can use this xpath - `//div[@class="section_heading"]/following-sibling::div/div[@class="game_summary expanded nohover"]` – Swaroop Humane Apr 28 '21 at 23:31
0

Actually the site doesn't require selenium at all. All the data is there through a simple requests (it's just in the Comments of the html, would just need to parse that). Secondly, you can grab the box scores quite easily with pandas

import pandas as pd

dfs = pd.read_html('https://www.basketball-reference.com/boxscores/')

for idx, table in enumerate(dfs[:-2]):
    print (table)
    if (idx+1)%3 == 0:
        print("-----------")
chitown88
  • 27,527
  • 4
  • 30
  • 59