1

I am using Python 3 and Selenium to grab some image links from a website as below:

import sys
import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType

chrome_options = Options()  
chrome_options.add_argument("--headless")

driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')

link_xpath = '/html/body/main/div/div[2]/div[2]/div/div/div[2]/div/div[2]/div[1]/div/div/div[2]/div/img'

link_path = driver.find_element_by_xpath(link_xpath).text
print(link_path)

driver.quit()

When parsing this URL you can see the image in question in the middle of the page. When you right click in Google Chrome and inspect element, you can then right click the element itself within Chrome Dev Tools and get the xpath for this image.

All looks in order to me, however when running the above code I get the following error:

Traceback (most recent call last):
  File "G:\folder\folder\testfilepy", line 16, in <module>
    link_path = driver.find_element_by_xpath(link_xpath).text
  File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
    return self.find_element(by=By.XPATH, value=xpath)
  File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
    'value': value})['value']
  File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "G:\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/main/div/div[2]/div[2]/div/div/div[2]/div/div[2]/div[1]/div/div/div[2]/div/img"}
  (Session info: headless chrome=83.0.4103.61)

Can anyone tell me why Selenium is unable to find the xpath provided?

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
gdogg371
  • 3,879
  • 14
  • 63
  • 107
  • Try this `link_xpath = '//div[@class="c-bezel programme-content__image"]//img'`, but actually the element has no text to return, what do you want to achieve, what are the attributes? – frianH Jun 05 '20 at 11:00
  • hi - when inspecting the element i see a http link to the image: https://images.metadata.sky.com/pd-image/251eeec2-acb3-4733-891b-60f10f2cc28c/16-9/640 ....i want to grab that link basically – gdogg371 Jun 05 '20 at 11:02

4 Answers4

1

To extract the src attribute of the image you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    options = webdriver.ChromeOptions() 
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument('--headless')
    options.add_argument('--window-size=1920,1080')
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.o-layout__item div.c-bezel.programme-content__image>img"))).get_attribute("src"))
    
  • Using XPATH:

    options = webdriver.ChromeOptions() 
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument('--headless')
    options.add_argument('--window-size=1920,1080')
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')     
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='o-layout__item']//div[@class='c-bezel programme-content__image']/img"))).get_attribute("src"))
    
  • Console Output:

    https://images.metadata.sky.com/pd-image/251eeec2-acb3-4733-891b-60f10f2cc28c/16-9/640
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Reference

You can find a couple of detailed discussion on NoSuchElementException in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
1

You have the correct xpath, but don't use absolute paths, it's very vulnerable to change. Try this relative xpath : //div[@class="c-bezel programme-content__image"]//img.

And to achieve you mean, please use .get_attribute("src") not .text

driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//div[@class="c-bezel programme-content__image"]//img')))
print(element.get_attribute("src"))
driver.quit()

Or better way, use css selector. This should be faster:

element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.c-bezel.programme-content__image > img')))

Reference : https://selenium-python.readthedocs.io/locating-elements.html

frianH
  • 7,295
  • 6
  • 20
  • 45
  • hi - i did not know you could parse xpaths in this manner, but will make sure i am doing it via this method in the future. thanks. – gdogg371 Jun 05 '20 at 11:19
  • 1
    @gdogg371 welcome, just for reference [locating-elements-by-xpath](https://selenium-python.readthedocs.io/locating-elements.html#locating-by-xpath) and [locating-elements-by-css-selectors](https://selenium-python.readthedocs.io/locating-elements.html#locating-elements-by-css-selectors) – frianH Jun 05 '20 at 11:24
0

If you are working in headless mode, it usually is a good idea to add window size. Add this line to your options:

chrome_options.add_argument('window-size=1920x1080')
0buz
  • 3,443
  • 2
  • 8
  • 29
  • why? what does this option do? – gdogg371 Jun 05 '20 at 10:53
  • ...actually, i can see that this no longer throws an error, although i do not know why...however, it appears to now return a blank string, as i can see no text returned at all... – gdogg371 Jun 05 '20 at 10:56
  • You would see no text as you are pointing to an img tag. You can manually check the DOM to confirm the text is "". – 0buz Jun 05 '20 at 11:02
  • as per the comment response above, i want to grab the http link to the image that you can see when inspecting the element in dev tools. that is what i am after... – gdogg371 Jun 05 '20 at 11:03
  • 1
    `driver.find_element_by_xpath(link_xpath).get_attribute('src')` does that. – 0buz Jun 05 '20 at 11:04
  • thanks that worked. please though, can you explain what the window size option does so that i know for future reference? – gdogg371 Jun 05 '20 at 11:06
  • 1
    Honestly it's something I have learned from experience. Some websites will need faking window size to work with their DOM in headless mode. It's not obvious, nor clearly documented unfortunately. – 0buz Jun 05 '20 at 11:13
  • ok, well thanks for the FYI. I have added this option to my selenium template that i baseline projects/tasks off for future use... – gdogg371 Jun 05 '20 at 11:16
0

Your xpath seems to be correct. You wasn't able to locate because you forgot to handle the cookie. Try it by yourself. Put the driver on hold for few seconds and click agree to all cookies. And then you will see your element. There are multiple way to handle cookie. I was able to locate xpath by using my own xpath which is cleaner. I visit that element from nearest parent.

Hope this help.

myoz89
  • 26
  • 2