0

I'm trying to download a bunch of images and categorize them into folders using Selenium. To do so, I need to grab two ID's associated with each image within the URL. However I'm having trouble scraping the image link from the src attribute. Whether I try to grab by tag, Xpath, or other method the end result is merely "None".

Here's an example of an inspected image page:

<html style="height: 100%;"
    ><head><meta name="viewport" content="width=device-width, minimum-scale=0.1"> 
        <title>index.php (2448×3264)</title>
       </head>
    <body style="margin: 0px; background: #0e0e0e; height: 100%">
        <img style="-webkit-user-select: none;margin: auto;cursor: zoom-in;background-color: hsl(0, 0%, 90%);transition: background-color 300ms;" src="https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=LQCMY&amp;fieldname=DT006_picture&amp;p=show" width="444" height="593">
   </body>
 </html>

For this example, I would need to grab "LQCMY" and "DT006_picture" as strings from the URL above. The code below shows my attempt at scraping the URL link (edited down since prior screens I click through are locked behind passwords that I can't give out).

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Image = '/html/body/div[1]/div[2]/div/table/tbody/tr[1]/td[1]/a'
driver.find_element_by_xpath(Image).click()
Image_URL = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH, Image))).get_attribute('src')
print(Image_URL)

Are there certain src's that can't be scraped, or am I scraping the wrong tag?

I've tried grabbing by tag but that also returns "None" as well.

Image_URL = driver.find_element_by_xpath(Image).get_attribute('src')

Other posts said WebDriverWait would help, but I've tried adjusting the wait time and am still receiving "None" too

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
David K
  • 1
  • 1
  • 1
    Your xpath doesnt point to the image element. Try another xpath expression, for example just `"//img"` which matches the first image element on the page – juhat Apr 04 '22 at 19:41

1 Answers1

0

To print the value of the src attribute you can use either of the following locator strategies:

  • Using css_selector:

    print(driver.find_element_by_css_selector("body img[style*='webkit-user-select'][src^='https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=']").get_attribute("src"))
    
  • Using xpath:

    print(driver.find_element_by_xpath("//body//img[contains(@style, 'webkit-user-select') and starts-with(@src, 'https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=')]").get_attribute("src"))
    

Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "body img[style*='webkit-user-select'][src^='https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=']"))).get_attribute("src"))
    
  • Using XPATH:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//body//img[contains(@style, 'webkit-user-select') and starts-with(@src, 'https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=')]"))).get_attribute("src"))
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in Python Selenium - get href value

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352