I am attempting to scrape MLB player stats from the MLB.com site:
I have the following Python code working which uses find_element(By.XPATH, yada yada yada) in Selenium, but it assumes that there are ALWAYS 7 pages, which there are not.
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
url = r"https://www.mlb.com/stats/"
driver = webdriver.Firefox()
time.sleep(5)
driver.get(url)
time.sleep(5)
# Download the 1st page -
# yada yada yada
# List xpaths for pages 2 - 7
pages = [
"/html/body/main/div[4]/section/section/div[3]/div[2]/div/div/div[1]/div[2]/button/span",
"/html/body/main/div[4]/section/section/div[3]/div[2]/div/div/div[1]/div[3]/button/span",
"/html/body/main/div[4]/section/section/div[3]/div[2]/div/div/div[2]/div[4]/button",
"/html/body/main/div[4]/section/section/div[3]/div[2]/div/div/div[2]/div[5]/button",
"/html/body/main/div[4]/section/section/div[3]/div[2]/div/div/div[2]/div[6]/button/span",
"/html/body/main/div[4]/section/section/div[3]/div[2]/div/div/div[2]/div[7]/button/span"
]
# Loop thru pages 2 - 7
k = 0
for page in pages:
k = k + 1
print("Page "+str(k+1))
print("Loop "+str(k), page)
# Scroll to bottom of page to make Pagination Visible
if k == 1:
driver.maximize_window() # For maximizing window
driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
time.sleep(5)
pageButtonSelect = driver.find_element(By.XPATH, page)
pageButtonSelect.click()
time.sleep(5)
# Download next 25 players on this page
# yada yada yada
I would like to make this code more dynamic and flexible to handle any number of pages, however this is a little beyond my current Selenium skills...
I have attempted to identify the parent div that contains child divs within which each of the page buttons are nested and then count the number of child divs, but it is not working as desired.
==================================================
pagebuttonsdiv = driver.find_element(By.XPATH, '//*[@id="stats-app-root"]/section/section/div[3]/div[2]/div/div/div[1]')
npagebuttons = len(pagebuttonsdiv.find_elements(By.XPATH, "./div"))
Can someone please suggest python selenium code to loop thru each child div and click on the pagination button nested within?