Retrieving hrefs from the MLS page

Question

I am currently trying to retrieve the associated match links which are hrefs from this page.I cannot seem to find them straight off the bat using selenium/soup. I understand they might be from a api but I cant figure out how to find them under the section class of mls-l-module mls-l-module--match-list

import requests
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
from time import sleep, time
import pandas as pd
import warnings
import numpy as np
from datetime import datetime
import json
from bs4 import BeautifulSoup

warnings.filterwarnings('ignore')

base_url = 'https://www.mlssoccer.com/schedule/scores#competition=mls-regular-season&club=all&date=2023-02-20'

# create an empty list to store urls.
urls = []

option = Options()
option.headless = False
driver = webdriver.Chrome("##########",options=option)
driver.get(base_url)

# click the cookie pop up
WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[3]/div[2]/div/div[1]/div/div[2]/div/button[2]'))).click()

the output is expected to be a list of urls from this page, where I will loop to the next page and collect all href links for matches.Perhaps using selenium to render the page for soup is a better option

There are xhr requests to api that you can work with instead of using selenium. Just inspect `https://sportapi.mlssoccer.com/api/matches?...` requests in devtools network — sudden_appearance, Jun 19 '23 at 23:13

score 1 · Accepted Answer · answered Jun 19 '23 at 23:21

As stated in the comments you can bypass selenium altogether and use their Ajax API directly:

import requests

params = {
    "culture": "en-us",
    "dateFrom": "2023-02-19",
    "dateTo": "2023-02-27",
    "competition": "98",
    "matchType": "Regular",
    "excludeSecondaryTeams": "true",
}

api_url = 'https://sportapi.mlssoccer.com/api/matches'
base_url = 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/'

data = requests.get(api_url, params=params).json()

for m in data:
    h, a = m['home']['fullName'], m['away']['fullName']
    print(f'{h:<30} {a:<30} {base_url + m["slug"]}/')

Prints:

Nashville SC                   New York City Football Club    https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/nshvsnyc-02-25-2023/
Atlanta United                 San Jose Earthquakes           https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/atlvssj-02-25-2023/
Charlotte FC                   New England Revolution         https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/cltvsne-02-25-2023/
FC Cincinnati                  Houston Dynamo FC              https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/cinvshou-02-25-2023/
D.C. United                    Toronto FC                     https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/dcvstor-02-25-2023/
Inter Miami CF                 CF Montréal                    https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/miavsmtl-02-25-2023/
Orlando City                   New York Red Bulls             https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/orlvsrbny-02-25-2023/
Philadelphia Union             Columbus Crew                  https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/phivsclb-02-25-2023/
Austin FC                      St. Louis CITY SC              https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/atxvsstl-02-25-2023/
FC Dallas                      Minnesota United               https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/dalvsmin-02-25-2023/
Vancouver Whitecaps FC         Real Salt Lake                 https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/vanvsrsl-02-25-2023/
Seattle Sounders FC            Colorado Rapids                https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/seavscol-02-26-2023/

that code is exactly it. Out of curiosity as I am using dev tools network to scroll down the page, I cannot find the sportsapi anywhere here, how did you find it? — Paul Corcoran, Jun 19 '23 at 23:27
@PaulCorcoran Try to click on the dropdown menu where is "All Season", "Regular Season" etc. and watch Network tab in Developer tools. — Andrej Kesely, Jun 19 '23 at 23:29

undetected Selenium · Answer 2 · 2023-06-19T23:33:00.630

To print the links you can use either of the following locator strategies:

Using CSS_SELECTOR:

driver.get("https://www.mlssoccer.com/schedule/scores#competition=mls-regular-season&club=all&date=2023-02-20")
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.mls-c-match-list__match a")))])

Using XPATH:

driver.get("https://www.mlssoccer.com/schedule/scores#competition=mls-regular-season&club=all&date=2023-02-20")
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//button[@id='onetrust-accept-btn-handler']"))).click()
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class, 'mls-c-match-list__match')]//a")))])

Console Output:

['https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/nshvsnyc-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/atlvssj-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/cltvsne-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/cinvshou-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/dcvstor-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/miavsmtl-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/orlvsrbny-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/phivsclb-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/atxvsstl-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/dalvsmin-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/vanvsrsl-02-25-2023']

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

@PaulCorcoran _Perhaps using selenium to render the page for soup is a better option_ — undetected Selenium, Jun 19 '23 at 23:34

Retrieving hrefs from the MLS page

2 Answers2