Code Proposal:
Collecting the links to all the games of the day present on the page (https://int.soccerway.com/matches/2021/07/28/), giving me the freedom to change the date to whatever I want, such as 2021/08/01
and so on. So that in the future I can loop and collect the list from several different days at the same time, in one code call.
Even though it's a very slow model, without using Headless
, this model clicks all the buttons, expands the data and imports all 465 listed match links:
for btn in driver.find_elements_by_xpath("//tr[contains(@class,'group-head clickable')]"):
btn.click()
Full Code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-logging"])
driver = webdriver.Chrome(r"C:\Users\Computador\Desktop\Python\chromedriver.exe", options=options)
url = "https://int.soccerway.com/matches/2021/07/28/"
driver.get(url)
driver.find_element_by_xpath("//div[@class='language-picker-trigger']").click()
driver.find_element_by_xpath("//a[@href='https://int.soccerway.com']").click()
time.sleep(10)
for btn in driver.find_elements_by_xpath("//tr[contains(@class,'group-head clickable')]"):
btn.click()
time.sleep(10)
jogos = driver.find_elements_by_xpath("//td[contains(@class,'score-time')]//a")
for jogo in jogos:
resultado = jogo.get_attribute("href")
print(resultado)
driver.quit()
But when I add options.add_argument("headless")
so that the browser is not opened on my screen, the model returns the following error:
Message: element click intercepted
To get around this problem, I analyzed options and found this one on WebDriverWait
(https://stackoverflow.com/a/62904494/11462274) and tried to use it like this:
for btn in WebDriverWait(driver, 1).until(EC.element_to_be_clickable((By.XPATH, "//tr[contains(@class,'group-head clickable')]"))):
btn.click()
Full Code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
options.add_argument("headless")
options.add_experimental_option("excludeSwitches", ["enable-logging"])
driver = webdriver.Chrome(r"C:\Users\Computador\Desktop\Python\chromedriver.exe", options=options)
url = "https://int.soccerway.com/matches/2021/07/28/"
driver.get(url)
driver.find_element_by_xpath("//div[@class='language-picker-trigger']").click()
driver.find_element_by_xpath("//a[@href='https://int.soccerway.com']").click()
time.sleep(10)
for btn in WebDriverWait(driver, 1).until(EC.element_to_be_clickable((By.XPATH, "//tr[contains(@class,'group-head clickable')]"))):
btn.click()
time.sleep(10)
jogos = driver.find_elements_by_xpath("//td[contains(@class,'score-time')]//a")
for jogo in jogos:
resultado = jogo.get_attribute("href")
print(resultado)
driver.quit()
But because it's not iterable, it returns in error:
'NoneType' object is not iterable
Why do I need this option?
1 - I'm going to automate it in an online terminal, so there won't be any browser to open on the screen and I need to make it fast so I don't spend too much of my time limits on the terminal.
2 - I need to find an option that I can use any date instead of 2021/07/28
in:
url = "https://int.soccerway.com/matches/2021/07/28/"
Where in the future I'll add the parameter:
today = date.today().strftime("%Y/%m/%d")
In this answer (https://stackoverflow.com/a/68535595/11462274), a guy indicated a very fast and interesting option (He named the option at the end of the answer as: Quicker Version) without the need for a WebDriver
, but I was only able to make it work on the first page of the site, when I try to use other dates of the year, he keeps returning only the links to the games of the current day.
Expected Result (there are 465 links but I didn't put the entire result because there is a character limit):
https://int.soccerway.com/matches/2021/07/28/europe/uefa-champions-league/fc-sheriff-tiraspol/alashkert-fc/3517568/
https://int.soccerway.com/matches/2021/07/28/europe/uefa-champions-league/fk-neftchi/olympiakos-cfp/3517569/
https://int.soccerway.com/matches/2021/07/28/europe/uefa-champions-league/scs-cfr-1907-cluj-sa/newcastle-fc/3517571/
https://int.soccerway.com/matches/2021/07/28/europe/uefa-champions-league/fc-midtjylland/celtic-fc/3517576/
https://int.soccerway.com/matches/2021/07/28/europe/uefa-champions-league/fk-razgrad-2000/mura/3517574/
https://int.soccerway.com/matches/2021/07/28/europe/uefa-champions-league/galatasaray-sk/psv-nv/3517577/
https://int.soccerway.com/matches/2021/07/28/europe/uefa-champions-league/bsc-young-boys-bern/k-slovan-bratislava/3517566/
https://int.soccerway.com/matches/2021/07/28/europe/uefa-champions-league/fk-crvena-zvezda-beograd/fc-kairat-almaty/3517570/
https://int.soccerway.com/matches/2021/07/28/europe/uefa-champions-league/ac-sparta-praha/sk-rapid-wien/3517575/
https://int.soccerway.com/matches/2021/07/28/world/olympics/saudi-arabia-u23/brazil--under-23/3497390/
https://int.soccerway.com/matches/2021/07/28/world/olympics/germany-u23/cote-divoire-u23/3497391/
https://int.soccerway.com/matches/2021/07/28/world/olympics/romania-u23/new-zealand-under-23/3497361/
https://int.soccerway.com/matches/2021/07/28/world/olympics/korea-republic-u23/honduras-u23/3497362/
https://int.soccerway.com/matches/2021/07/28/world/olympics/australia-under-23/egypt-under-23/3497383/
https://int.soccerway.com/matches/2021/07/28/world/olympics/spain-under-23/argentina-under-23/3497384/
https://int.soccerway.com/matches/2021/07/28/world/olympics/france-u23/japan-u23/3497331/
https://int.soccerway.com/matches/2021/07/28/world/olympics/south-africa-u23/mexico-u23/3497332/
https://int.soccerway.com/matches/2021/07/28/africa/cecafa-senior-challenge-cup/uganda-under-23/eritrea-under-23/3567664/
Note 1: There are multiple types of score-time
, such as score-time status
and score-time score
, that's why I used contains
in "//td[contains(@class,'score-time')]//a"
Update
If possible, in addition to helping me solve the current problem, I am interested in an improved and faster option for the method I currently use. (I'm still learning, so my methods are pretty archaic).