Python Selenium - How to scrape URL from src attribute using Selenium and Python

Question

I'm trying to download a bunch of images and categorize them into folders using Selenium. To do so, I need to grab two ID's associated with each image within the URL. However I'm having trouble scraping the image link from the src attribute. Whether I try to grab by tag, Xpath, or other method the end result is merely "None".

Here's an example of an inspected image page:

<html style="height: 100%;"
    ><head><meta name="viewport" content="width=device-width, minimum-scale=0.1"> 
        <title>index.php (2448×3264)</title>
       </head>
    <body style="margin: 0px; background: #0e0e0e; height: 100%">
        <img style="-webkit-user-select: none;margin: auto;cursor: zoom-in;background-color: hsl(0, 0%, 90%);transition: background-color 300ms;" src="https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=LQCMY&amp;fieldname=DT006_picture&amp;p=show" width="444" height="593">
   </body>
 </html>

For this example, I would need to grab "LQCMY" and "DT006_picture" as strings from the URL above. The code below shows my attempt at scraping the URL link (edited down since prior screens I click through are locked behind passwords that I can't give out).

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Image = '/html/body/div[1]/div[2]/div/table/tbody/tr[1]/td[1]/a'
driver.find_element_by_xpath(Image).click()
Image_URL = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH, Image))).get_attribute('src')
print(Image_URL)

Are there certain src's that can't be scraped, or am I scraping the wrong tag?

I've tried grabbing by tag but that also returns "None" as well.

Image_URL = driver.find_element_by_xpath(Image).get_attribute('src')

Other posts said WebDriverWait would help, but I've tried adjusting the wait time and am still receiving "None" too

Your xpath doesnt point to the image element. Try another xpath expression, for example just `"//img"` which matches the first image element on the page — juhat, Apr 04 '22 at 19:41

undetected Selenium · Answer 1 · 2022-04-04T21:16:05.927

To print the value of the src attribute you can use either of the following locator strategies:

Using css_selector:

print(driver.find_element_by_css_selector("body img[style*='webkit-user-select'][src^='https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=']").get_attribute("src"))

Using xpath:

print(driver.find_element_by_xpath("//body//img[contains(@style, 'webkit-user-select') and starts-with(@src, 'https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=')]").get_attribute("src"))

Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

Using CSS_SELECTOR:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "body img[style*='webkit-user-select'][src^='https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=']"))).get_attribute("src"))

Using XPATH:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//body//img[contains(@style, 'webkit-user-select') and starts-with(@src, 'https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=')]"))).get_attribute("src"))

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in Python Selenium - get href value

Python Selenium - How to scrape URL from src attribute using Selenium and Python

1 Answers1