0

I am attempting to scrape MLB player stats from the MLB.com site:

https://www.mlb.com/stats/

I have the following Python code working which uses find_element(By.XPATH, yada yada yada) in Selenium, but it assumes that there are ALWAYS 7 pages, which there are not.

import time
from selenium import webdriver
from selenium.webdriver.common.by import By

url = r"https://www.mlb.com/stats/"
driver = webdriver.Firefox()
time.sleep(5)
driver.get(url)
time.sleep(5)

# Download the 1st page - 
# yada yada yada

# List xpaths for pages 2 - 7

pages = [
    "/html/body/main/div[4]/section/section/div[3]/div[2]/div/div/div[1]/div[2]/button/span",
    "/html/body/main/div[4]/section/section/div[3]/div[2]/div/div/div[1]/div[3]/button/span",
    "/html/body/main/div[4]/section/section/div[3]/div[2]/div/div/div[2]/div[4]/button",
    "/html/body/main/div[4]/section/section/div[3]/div[2]/div/div/div[2]/div[5]/button",
    "/html/body/main/div[4]/section/section/div[3]/div[2]/div/div/div[2]/div[6]/button/span",
    "/html/body/main/div[4]/section/section/div[3]/div[2]/div/div/div[2]/div[7]/button/span"
]

# Loop thru pages 2 - 7

k = 0
for page in pages:
    k = k + 1
    print("Page "+str(k+1))
    print("Loop "+str(k), page)
    # Scroll to bottom of page to make Pagination Visible
    if k == 1:
        driver.maximize_window() # For maximizing window
        driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
        time.sleep(5)
    pageButtonSelect = driver.find_element(By.XPATH, page)
    pageButtonSelect.click()
    time.sleep(5)
    # Download next 25 players on this page
    # yada yada yada

I would like to make this code more dynamic and flexible to handle any number of pages, however this is a little beyond my current Selenium skills...

I have attempted to identify the parent div that contains child divs within which each of the page buttons are nested and then count the number of child divs, but it is not working as desired.

enter image description here

==================================================

enter image description here

pagebuttonsdiv = driver.find_element(By.XPATH, '//*[@id="stats-app-root"]/section/section/div[3]/div[2]/div/div/div[1]')

npagebuttons = len(pagebuttonsdiv.find_elements(By.XPATH, "./div"))

Can someone please suggest python selenium code to loop thru each child div and click on the pagination button nested within?

Dan
  • 165
  • 5
  • 18

3 Answers3

1

There are multiple ways to do this.
You could check length of class='bui-button-group pagination bui-button-group' and use base_url = "https://www.mlb.com/stats/?page={}".

...
pages = driver.find_elements(By.CSS_SELECTOR, 'div.bui-button-group.pagination.bui-button-group > div')
for pg_num in range(2, len(pages)+1):
    driver.get(base_url.format(pg_num)) # https://www.mlb.com/stats/?page=2
    ...

Or you could just check for Next Button.

from selenium.webdriver.support import expected_conditions as EC
...
while True:
...
    next_button_xpath = '//button[@aria-label="next page button"]'
    if EC.presence_of_element_located((By.XPATH, next_button_xpath)):
        driver.find_element(By.XPATH, next_button_xpath).click()
        driver.forward()
    else:
       break

Also, use selenium wait instead of time.sleep()

Reyot
  • 466
  • 1
  • 3
  • 9
  • Thanks for the response. How exactly would I use the 1st method you describe. I do have the length of the div with class='bui-button-group pagination bui-button-group', but I do not fully understand how to use it with base_url = "https://www.mlb.com/stats/?page={}". I tried manually typing in https://www.mlb.com/stats/?page={2} in the browser address bar and the site stills returns the 1st page. Can you elaborate? – Dan Jul 11 '23 at 21:12
  • I added the code for first part. check if it works for you – Reyot Jul 11 '23 at 21:22
  • if you are using first method, instead of sending request twice you can create function and call that once before and rest inside the for-loop – Reyot Jul 11 '23 at 21:44
1

To loop through all the pages to click you need to click on the next page inducing WebDriverWait for the element_to_be_clickable() within a try-except block and you can use the following locator strategies:

  • Code block:

    driver.get("https://www.mlb.com/stats/")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
    while True:
      try:
          WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[starts-with(@aria-label, 'page') and @aria-current='page']//following::div[1]/button/span"))).click()
          print("Navigating to next page")
          time.sleep(3) # for demo purpose
      except TimeoutException:
          print("No more pages to navigate")
          break
    driver.quit()
    
  • Console output:

    Navigating to next page
    Navigating to next page
    Navigating to next page
    Navigating to next page
    Navigating to next page
    Navigating to next page
    No more pages to navigate
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

No need to use Selenium here. Go straight to the source of the data feed. You get it in one request with out the overhead of using Selenium:

import requests
import pandas as pd

url = 'https://bdfed.stitch.mlbinfra.com/bdfed/stats/player'
payload = {
    'stitch_env': 'prod',
    'season': '2023',
    'sportId': '1',
    'stats': 'season',
    'group': 'hitting',
    'gameType': 'R',
    'limit': '999',
    'offset': '0',
    'sortStat': 'onBasePlusSlugging',
    'order': 'desc',
    'playerPool': 'QUALIFIED'}  # change to 'ALL' to get all players

jsonData = requests.get(url, params=payload).json()
df = pd.DataFrame(jsonData['stats'])

Output: 1st 10 rows of 151

print(df.head(10).to_string())
   year  playerId        playerName    type  rank    playerFullName playerFirstName playerLastName playerUseName playerInitLastName  teamId teamAbbrev              teamName teamShortName leagueName  leagueId positionAbbrev           position primaryPositionAbbrev  plateAppearances  totalBases  leftOnBase  sacBunts  sacFlies babip  extraBaseHits  hitByPitch  gidp  gidpOpp  numberOfPitches pitchesPerPlateAppearance walksPerPlateAppearance strikeoutsPerPlateAppearance homeRunsPerPlateAppearance walksPerStrikeout   iso  reachedOnError  walkOffs  flyOuts  totalSwings  swingAndMisses  ballsInPlay  popOuts  lineOuts  groundOuts  flyHits  popHits  lineHits  groundHits  gamesPlayed  airOuts  runs  doubles  triples  homeRuns  strikeOuts  baseOnBalls  intentionalWalks  hits   avg  atBats   obp   slg    ops  caughtStealing  stolenBases stolenBasePercentage  groundIntoDoublePlay  rbi groundOutsToAirouts  catchersInterference atBatsPerHomeRun
0  2023    660271     Shohei Ohtani  player     1     Shohei Ohtani          Shohei         Ohtani        Shohei           S Ohtani     108        LAA    Los Angeles Angels        Angels         AL       103             DH  Designated Hitter                   TWP               398         226         145         0         3  .316             53           1     7       67             1557                     3.912                    .121                         .219                       .080              .552  .361               6         0       48          746             224          257       12        14          80       36        0        35          32           89       74    63       15        6        32          87           48                 4   103  .302     341  .387  .663  1.050               4           11                 .733                     7   71                1.08                     5            10.66
1  2023    660670  Ronald Acuna Jr.  player     2  Ronald Acuna Jr.          Ronald          Acuna        Ronald            R Acuña     144        ATL        Atlanta Braves        Braves         NL       104             RF         Outfielder                    RF               409         209         101         0         2  .337             47           4     7       40             1577                     3.856                    .108                         .120                       .051              .898  .251               3         0       49          721             137          312       12        23         109       22        0        52          45           89       84    79       25        1        21          49           44                 2   119  .331     359  .408  .582   .990               7           41                 .854                     7   55                1.30                     0            17.10
2  2023    605141      Mookie Betts  player     3      Mookie Betts          Markus          Betts        Mookie            M Betts     119        LAD   Los Angeles Dodgers       Dodgers         NL       104             RF         Outfielder                    RF               396         195          88         0         5  .267             50           4     4       30             1501                     3.790                    .136                         .164                       .066              .831  .309               4         0       74          575             105          273       20        22          65       28        1        48          15           86      116    72       23        1        26          65           54                 2    92  .276     333  .379  .586   .965               2            7                 .778                     4   62                0.56                     0            12.81
3  2023    518692   Freddie Freeman  player     4   Freddie Freeman       Frederick        Freeman       Freddie          F Freeman     119        LAD   Los Angeles Dodgers       Dodgers         NL       104             1B         First Base                    1B               407         198         125         0         4  .355             49           7     5       57             1532                     3.764                    .098                         .172                       .042              .571  .236               4         1       57          755             168          290        9        30          80       35        1        59          19           89       96    72       31        1        17          70           40                 6   114  .320     356  .396  .556   .952               1           12                 .923                     5   61                0.83                     0            20.94
4  2023    621566        Matt Olson  player     5        Matt Olson         Matthew          Olson          Matt            M Olson     144        ATL        Atlanta Braves        Braves         NL       104             1B         First Base                    1B               400         195         135         0         1  .280             48           2     8       54             1723                     4.308                    .135                         .270                       .073              .500  .315               1         0       56          829             259          236       16        12          65       33        0        31          23           89       84    70       17        2        29         108           54                 2    87  .254     343  .358  .569   .927               0            1                1.000                     8   72                0.77                     0            11.83
5  2023    650490        Yandy Diaz  player     6        Yandy Diaz           Yandy           Diaz         Yandy             Y Díaz     139         TB        Tampa Bay Rays          Rays         AL       103             1B         First Base                    1B               341         153          88         0         1  .359             31           4    11       40             1331                     3.903                    .114                         .158                       .038              .722  .192               0         0       41          558              97          244        6        15          86       13        0        41          42           78       62    58       18        0        13          54           39                 0    96  .323     297  .408  .515   .923               1            0                 .000                    11   43                1.39                     0            22.85
6  2023    682998    Corbin Carroll  player     7    Corbin Carroll          Corbin        Carroll        Corbin          C Carroll     109         AZ  Arizona Diamondbacks       D-backs         NL       104             LF         Outfielder                    LF               349         169         122         2         1  .317             41           6     3       60             1343                     3.848                    .092                         .192                       .052              .478  .260               2         2       41          600             121          244       18        18          78       22        0        31          36           86       77    63       20        3        18          67           32                 1    89  .289     308  .366  .549   .915               2           26                 .929                     3   48                1.01                     0            17.11
7  2023    650333       Luis Arraez  player     8       Luis Arraez            Luis         Arraez          Luis           L Arraez     146        MIA         Miami Marlins       Marlins         NL       104             2B        Second Base                    2B               362         155          94         0         2  .398             22           4    12       68             1279                     3.533                    .075                         .052                       .008             1.421  .088               2         0       50          629              47          312        8        27         101       12        0        69          45           86       85    40       18        1         3          19           27                 8   126  .383     329  .434  .471   .905               2            1                 .333                    12   42                1.19                     0           109.67
8  2023    673357   Luis Robert Jr.  player     9   Luis Robert Jr.            Luis         Robert          Luis       L Robert Jr.     145        CWS     Chicago White Sox     White Sox         AL       103             CF         Outfielder                    CF               375         193         139         0         1  .319             49           9     7       51             1402                     3.739                    .056                         .285                       .069              .196  .298               6         1       40          785             257          233       21        15          65       30        0        39          23           89       76    62       23        0        26         107           21                 4    92  .271     339  .330  .569   .899               2            8                 .800                     7   51                0.86                     5            13.04
9  2023    665742         Juan Soto  player    10         Juan Soto            Juan           Soto          Juan             J Soto     135         SD      San Diego Padres        Padres         NL       104             LF         Outfielder                    LF               396         148         112         0         3  .307             36           1     6       48             1630                     4.116                    .210                         .199                       .038             1.051  .214               2         0       40          547             121          233       11        10          90       19        0        30          33           90       61    51       21        0        15          79           83                 7    82  .265     309  .419  .479   .898               2            6                 .750                     6   47                1.48                     0            20.60
chitown88
  • 27,527
  • 4
  • 30
  • 59