Error while web-scraping webdriver exception

Question

!pip install selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from time import sleep
from datetime import datetime
import pandas as pd
errors = []
season = []

for id in range(46605, 46985):
my_url = f'https://www.premierleague.com/match/{id}'
option = Options()
#option.headless = True
driver = webdriver.Chrome(options=option)
driver.get(my_url)

Code runs fine till here.

date = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="mainContent"]/div/section/div[2]/section/div[1]/div/div[1]/div[1]'))).text
date = datetime.strptime(date, '%a %d %b %Y').strftime('%m/%d/%Y')
home_team = driver.find_element_by_xpath('//*[@id="mainContent"]/div/section/div[2]/section/div[3]/div/div/div[1]/div[1]/a[2]/span[1]').text
away_team = driver.find_element_by_xpath('//*[@id="mainContent"]/div/section/div[2]/section/div[3]/div/div/div[1]/div[3]/a[2]/span[1]').text

An error pops up while executing these lines. Error Screenshot 1

Error Screenshot 2

I wouldn't put the selenium driver within the for loop- its pretty slow and will be running close to 400 times here. There's no reason to open and close the link that many times. Also, check out https://stackoverflow.com/questions/45688020/chrome-not-reachable-selenium-webdriver-error — Joe, Jan 16 '22 at 01:59
Adding to the comment made by @Joe, I see you are using a lot of long xpaths, which is not a good practice. Try using narrowly relative xpaths, like this one for the first element (date): `//div[@class='matchInfo']//div[contains(@class, 'matchDate')]` — Anand Gautam, Jan 16 '22 at 02:26
@AnandGautam if possible could you please write the exact syntax I have no clue about it. I am sorry If it's a bother — Kaustav Mallick, Jan 16 '22 at 07:30
I already gave you the date in my earlier comment. Just replace the xpath with mine, as it;s more relative. This one's for home_team: `//div[@class='team home']` This one's for away_team `//div[@class='team away']` — Anand Gautam, Jan 16 '22 at 09:15

score 1 · Answer 1 · answered Jan 16 '22 at 23:29

Why not use the api that the premierleague site uses?

import requests

fixture = 66553

headers =   {
    'accept':'*/*',
    'accept-encoding':"gzip;q=1.0, identity; q=0.5",
    'accept-language':'en-ZA,en;q=0.9,en-GB;q=0.8,en-US;q=0.7,de;q=0.6',
    'content-type':'application/x-www-form-urlencoded; charset=UTF-8',
    'origin':'https://www.premierleague.com',
    'referer':'https://www.premierleague.com/',
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36'
    }

url = f'https://footballapi.pulselive.com/football/broadcasting-schedule/fixtures/{fixture}'
data = requests.get(url,headers=headers).json()

print(data['fixture']['attendance'])
print(data['fixture']['kickoff']['label'])
print(data['fixture']['teams'])

url = f'https://footballapi.pulselive.com/football/fixtures/{fixture}/textstream/EN?pageSize=1000&sort=desc'
data = requests.get(url,headers=headers).json()

for message in data['events']['content']:
    print(message['text'])

score 0 · Answer 2 · answered Jan 16 '22 at 09:41

@Kaustav, writing this in answer as the it's a code block, and do not know how to put it in comment section. Although I second the thoughts of @Joe, since you wanted an exact syntax, I thought I could put together a block of code to show you. So, this code here, opens the browser, gets the details (as you put in your code), and then stores them in a list (for display purpose for this task, but in practice it may not be required - depends on your requirement).

I would again reiterate that opening, using, and closing a browser (even a headless one) for so many iterations may hit the performance at some point in time in the loop, and all your time would go waste. I would strongly suggest you to find an API of this website if available and use it to success.

Having said that, here is the code for your use.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from time import sleep
from datetime import datetime
import pandas as p
import time
errors = []
season = []

url_ls=[] # used to show to query creator a demo
for id in range(46605, 46985):
    my_url = f'https://www.premierleague.com/match/{id}'
    option = Options()
    option.add_argument('--headless')
    option.add_argument('--disable-gpu')
    driver = webdriver.Chrome(options=option)
    driver.get(my_url)
    title = driver.current_url
    try:
        WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//*[text()='Accept All Cookies']"))).click()
    except:
        print("cookie modal not found")
        continue
    # time.sleep(10)
    match_date = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='matchInfo']//div[contains(@class, 'matchDate')]"))).text
    match_date_f = datetime.strptime(match_date, '%a %d %b %Y').strftime('%m/%d/%Y')
    home_team = driver.find_element(By.XPATH, "//div[@class='team home']").text
    away_team = driver.find_element(By.XPATH, "//div[@class='team away']").text
    tup = (title + "|" + match_date + "|" + home_team + "|" + away_team)
    url_ls.append(tup)
    driver.close()
print(url_ls)  # used to show to query creator a demo

Here is the output (I just appended the current url, and the texts of the elements in this code, but you can further expand this block and add your elements - and you can as you've imported pandas, send data to a dataframe and then to excel or csv - as you like it)

['https://www.premierleague.com/match/46605|Sat 10 Aug 2019|Liverpool|Norwich City', 'https://www.premierleague.com/match/46606|Sat 10 Aug 2019|AFC Bournemouth|Sheffield United', 'https://www.premierleague.com/match/46607|Sat 10 Aug 2019|Burnley|Southampton', 'https://www.premierleague.com/match/46608|Sat 10 Aug 2019|Crystal Palace|Everton', 'https://www.premierleague.com/match/46609|Sun 11 Aug 2019|Leicester City|Wolverhampton Wanderers']

Process finished with exit code 0

Error while web-scraping webdriver exception

2 Answers2