2

I am working with Selenium in Python and using Firefox web driver.

I am trying to get the SRC of an image. When I first request the SRC I get the actual image data, not the SRC

data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQ ...

If I run the exact same code a second time I will get the SRC

example.jpg

Here is my code

fireFoxOptions = webdriver.FirefoxOptions()
fireFoxOptions.set_headless()
browser = webdriver.Firefox(firefox_options=fireFoxOptions)

element = browser.find_element(By.ID , "idOfImageHere" )
imageUrl = element.get_attribute("src")
print("image src: " + imageUrl)

Not sure why the image data is being returned on the first time the code is ran, and then the src in the second run. It almost seems that once the image is cached then it can get the src or something like that.

Any suggestions on how to prevent the image data from being returned, just the src link?

Thanks

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
nomaam
  • 1,213
  • 2
  • 22
  • 37
  • the src must be changing.. it's possible that there is javascript which re-writes the src attribute. (maybe it loads a low-res version or placeholder first) – pcalkins Jan 10 '20 at 21:56
  • that could be possible. I am scraping from amazon so I assume they have some fancy code running. That being said, I don't really care if I get the SRC of the low res image, I just don't want the image data – nomaam Jan 10 '20 at 21:58
  • you could use a WebDriverWait to wait for the SRC to endWith ".jpg" (or .gif, etc...) – pcalkins Jan 10 '20 at 22:06

1 Answers1

2

Amazon website elements are JavaScript enabled elements so to extract the src attribute of any element, you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using ID:

    print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.ID, "idOfImageHere"))).get_attribute("src"))
    
  • Using XPATH:

    print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.XPATH, "//*[@id='idOfImageHere]"))).get_attribute("src"))
    
  • Using CSS_SELECTOR:

    print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#idOfImageHere"))).get_attribute("src"))
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352