0

I'm trying to scrape https://www.livescore.com/en/ but I'm facing issue mainly because the structure is different from the others I've already worked on.

I see that there is a dynamic ID that increase the number while scrolling down the page, the id in the code are related only to the visible match on the page, then inside the code the Home team code seems the same compared to the away team code.

This is something I've tried working on

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome(executable_path=r"C:\Users\Lorenzo\Downloads\chromedriver.exe")
driver.maximize_window()
wait=WebDriverWait(driver,30)
driver.get('https://www.livescore.com/en/football/live/')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#onetrust-accept-btn-handler"))).click()


games1 = driver.find_elements(By.CSS_SELECTOR, 'div[class = "MatchRow_matchRowWrapper__1BtJ3"]')
data1 = []
for game1 in games1:
    data1.append({
        'Home':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRow_teamName__2cw5n"]').text,
        'Away':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRow_teamName__2cw5n"]').text,
        'Time':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRowTime_time__2Fkd2 MatchRowTime_isLive__2qWag"]').text

The idea is to have a dataframe of the live matches with Home team name, Away team name and actual minute of play

Can someone help me?

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Carolino
  • 51
  • 5

2 Answers2

2

AFAIK the clearest and simplest way to locate elements inside elements is to use XPath starting with a dot .
The Home and AWAY team names as well as the match Time fields can be clearly located by the following locators:

games1 = driver.find_elements(By.CSS_SELECTOR, 'div[class = "MatchRow_matchRowWrapper__1BtJ3"]')
data1 = []
for game1 in games1:
    data1.append({
        'Home':game1.find_element(By.XPATH, './/div[contains(@class,"MatchRow_home")]').text,
        'Away':game1.find_element(By.XPATH, './/div[contains(@class,"MatchRow_away")]').text,
        'Time':game1.find_element(By.XPATH, './/span[contains(@id,"match-row")]').text
Prophet
  • 32,350
  • 22
  • 54
  • 79
  • Thank you for the quick reply. The main issue is that this code scrape only the visibile matches (after running now the code, I received only 4 records instead of 11). In the HTML there are data-index values that increase every time you scroll the page and each data-indext value is related to a specific match – Carolino Mar 08 '22 at 18:57
  • I see. We can add a scrolling here, it's not a problem, but I think this will be a new question while I think I answered your original question. In case you agree with this please accept this answer and ask a new, separate question about the scrolling. – Prophet Mar 08 '22 at 19:19
1

To create a DataFrame using Pandas with the Home Team Name and Away Team Name from the website you need to induce WebDriverWait for the visibility_of_element_located() and you can use the following Locator Strategies:

  • Using CSS_SELECTOR:

    driver.get('https://www.livescore.com/en/')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
    Home_team_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id$='home-team-name']")))]
    Away_team_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id$='away-team-name']")))]
    df = pd.DataFrame(data=list(zip(Home_team_name, Away_team_name)), columns=['Home Team Name', 'Home Team Name'])
    print(df)
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • Console Output:

      Home Team Name       Home Team Name
    0  Bayern Munich          FC Salzburg
    1      Liverpool                Inter
    2       FC Porto                 Lyon
    3     Real Betis  Eintracht Frankfurt
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Hi @undetectedSelenium. Thanks for your reply. Your answer is helpful to catch data but here the issue is that with this flow I can scrape only the visible matches at the beginning. If you check after match n. 3, in the site there are other matches. The code update automatically based on a data-index while scrolling the page. – Carolino Mar 08 '22 at 23:34