1

I am currently trying to retrieve the associated match links which are hrefs from this page.I cannot seem to find them straight off the bat using selenium/soup. I understand they might be from a api but I cant figure out how to find them under the section class of mls-l-module mls-l-module--match-list

import requests
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
from time import sleep, time
import pandas as pd
import warnings
import numpy as np
from datetime import datetime
import json
from bs4 import BeautifulSoup

warnings.filterwarnings('ignore')

base_url = 'https://www.mlssoccer.com/schedule/scores#competition=mls-regular-season&club=all&date=2023-02-20'

# create an empty list to store urls.
urls = []

option = Options()
option.headless = False
driver = webdriver.Chrome("##########",options=option)
driver.get(base_url)

# click the cookie pop up
WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[3]/div[2]/div/div[1]/div/div[2]/div/button[2]'))).click()

enter image description here

the output is expected to be a list of urls from this page, where I will loop to the next page and collect all href links for matches.Perhaps using selenium to render the page for soup is a better option

Paul Corcoran
  • 113
  • 1
  • 9
  • 1
    There are xhr requests to api that you can work with instead of using selenium. Just inspect `https://sportapi.mlssoccer.com/api/matches?...` requests in devtools network – sudden_appearance Jun 19 '23 at 23:13

2 Answers2

1

As stated in the comments you can bypass selenium altogether and use their Ajax API directly:

import requests

params = {
    "culture": "en-us",
    "dateFrom": "2023-02-19",
    "dateTo": "2023-02-27",
    "competition": "98",
    "matchType": "Regular",
    "excludeSecondaryTeams": "true",
}

api_url = 'https://sportapi.mlssoccer.com/api/matches'
base_url = 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/'

data = requests.get(api_url, params=params).json()

for m in data:
    h, a = m['home']['fullName'], m['away']['fullName']
    print(f'{h:<30} {a:<30} {base_url + m["slug"]}/')

Prints:

Nashville SC                   New York City Football Club    https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/nshvsnyc-02-25-2023/
Atlanta United                 San Jose Earthquakes           https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/atlvssj-02-25-2023/
Charlotte FC                   New England Revolution         https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/cltvsne-02-25-2023/
FC Cincinnati                  Houston Dynamo FC              https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/cinvshou-02-25-2023/
D.C. United                    Toronto FC                     https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/dcvstor-02-25-2023/
Inter Miami CF                 CF Montréal                    https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/miavsmtl-02-25-2023/
Orlando City                   New York Red Bulls             https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/orlvsrbny-02-25-2023/
Philadelphia Union             Columbus Crew                  https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/phivsclb-02-25-2023/
Austin FC                      St. Louis CITY SC              https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/atxvsstl-02-25-2023/
FC Dallas                      Minnesota United               https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/dalvsmin-02-25-2023/
Vancouver Whitecaps FC         Real Salt Lake                 https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/vanvsrsl-02-25-2023/
Seattle Sounders FC            Colorado Rapids                https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/seavscol-02-26-2023/
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
1

To print the links you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    driver.get("https://www.mlssoccer.com/schedule/scores#competition=mls-regular-season&club=all&date=2023-02-20")
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.mls-c-match-list__match a")))])
    
  • Using XPATH:

    driver.get("https://www.mlssoccer.com/schedule/scores#competition=mls-regular-season&club=all&date=2023-02-20")
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//button[@id='onetrust-accept-btn-handler']"))).click()
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class, 'mls-c-match-list__match')]//a")))])
    
  • Console Output:

    ['https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/nshvsnyc-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/atlvssj-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/cltvsne-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/cinvshou-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/dcvstor-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/miavsmtl-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/orlvsrbny-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/phivsclb-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/atxvsstl-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/dalvsmin-02-25-2023', 'https://www.mlssoccer.com/competitions/mls-regular-season/2023/matches/vanvsrsl-02-25-2023']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352