0

Lately I have been learning Python scraping. The part of the source code:

div class='search__grid'>
  <div class="photos">
    <div class='photos__column'>
      <div class='hide-featured-badge hide-favorite-badge'>
        <article class='photo-item photo-item--overlay'>
          <a class="js-photo-link photo-item__link" href="/photo/person-holding-black-ceramic-pig-coin-bank-3943723/">    
            <img srcset="https://images.pexels.com/photos/3943723/pexels-photo-3943723.jpeg?auto=compress&amp;cs=tinysrgb&amp;dpr=1&amp;w=500 1x, https://images.pexels.com/photos/3943723/pexels-photo-3943723.jpeg?auto=compress&amp;cs=tinysrgb&amp;dpr=2&amp;w=500 2x" 
            class="photo-item__img" alt="Person Holding Black Ceramic Pig Coin Bank" data-image-width="3811" data-image-height="5716" 
            data-big-src="https://images.pexels.com/photos/3943723/pexels-photo-3943723.jpeg?auto=compress&amp;cs=tinysrgb&amp;h=750&amp;w=1260" />

I want to collect the the image link in img.srcset.data-large-src. However, I couldn't find the div element by using:

find_element_by_class_name('search__grid')

nor by_tag_name(div.search_grid) nor by_css_selector('divsearch_grid'). For example, an error occurred as I used by_class_name as below...

no such element: Unable to locate element: {"method":"css selector","selector":".search__grid"}

I didn't even use the css selector...!

Another question is how could I extract only the data-big-src link from the srcset attribute?

I look forward to your opinions. Thanks in advance.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
dexter2406
  • 451
  • 4
  • 14

3 Answers3

0

firstly get the img element

imgElement = driver.find_element_by_class_name('photo-item__img');

or by XPAHT

imgElement = driver.find_element_by_xpath("//img[contains(@class,'photo-item__img')]")

second, go for the attribute data-big-src

textElement = imgElement.get_attribute('data-big-src');
Alin Stelian
  • 861
  • 1
  • 6
  • 16
  • The first one gave: `Unable to locate element: {"method":"css selector","selector":".photo-item__img"}` The second one gave: `Unable to locate element: {"method":"xpath","selector":"//img[contains(@class,'photo-item__img')]"}` ... It's really weird, I have succedded in a couple of websites but this one just won't work – dexter2406 Sep 19 '20 at 09:19
  • updated my answer for xpath. If you want you can share the link to see exactly. – Alin Stelian Sep 19 '20 at 09:22
0

Try using something like the following to get the srcset value.

imgElement = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//img[class='photo-item__img']")))
print(imgElement.get_attribute('srcset'))

Import

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
Arundeep Chohan
  • 9,779
  • 5
  • 15
  • 32
0

The image link can be found within srcset as well as within data-big-src attribute, to print the value of the image link you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR and srcset attribute:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.search__grid > div.photos > div.photos__column a.js-photo-link.photo-item__link[href^='/photo/person-holding-black-ceramic-pig-coin-bank'] > img"))).get_attribute("data-clipboard-text"))
    
  • Using XPATH and data-big-src attribute:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='search__grid']/div[@class='photos']/div[@class='photos__column']//a[@class='js-photo-link photo-item__link' and starts-with(@href, '/photo/person-holding-black-ceramic-pig-coin-bank')]/img"))).get_attribute("data-big-src"))
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

References

You can find a couple of relevant discussions on NoSuchElementException in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352