2

I'm using selenium to automaticlly download several images from google images, cause all the other solutions previously made i found on internet were too slow or weren't working, but now i need the extract the source of the image, but when i try using element.get_attribute('src') it returns the base64 of the image,even tought when i search for the xpath on the chrome devtools the src attribute of the tag it's actually a url

this is a screenshot of the element in devtools

Code trials:

        for i in range(n):
            element = self.wait.until(
                EC.presence_of_element_located((By.XPATH, '//*[@id="Sva75c"]/div/div/div[3]/div[2]/c-wiz/div/div[1]/div[1]/div[2]/div/a/img')))
            src = element.get_attribute('src')
            print(element)
            self.download_file(src,keyword)

EDIT:

I actually tried what some of you said and instead of downloading the image, i converted the base 64 into a image and saved it, which was amazingly faster than saving using requests and the URL, but guess it was more a problem with the Google script than with my code, cause sometimes my code broke cause src actually returned a URL, in the end, i had to make a two different functions, one if src returned a url and other if returned base64

  • 1
    Can't you just use the base64 of the image and save that, instead of getting the URL and downloading that? – Dan P Nov 04 '21 at 15:50
  • maybe url is in different element - and maybe you should use better XPATH (without all these `div` but with classes or IDs. Or it uses JavaScript to replace values and it needs some time for this. Or server sends different code for different browsers/devices. – furas Nov 04 '21 at 18:10
  • without minimal working code it is hard to test this problem. – furas Nov 04 '21 at 18:11
  • Please provide page url to have us investigate further . – cruisepandey Nov 05 '21 at 07:32

2 Answers2

0

To print the value of the src attribute you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "img[alt='Letter A AC - Decortiles'][src]"))).get_attribute("src"))
    
  • Using XPATH:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//img[@alt='Letter A AC - Decortiles' and @src]"))).get_attribute("src"))
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • I'm actually using this right now, self.wait in my code is actually webdriverWait, I'm used to program in java too, so i have the habit of declaring too much variables in def __init__() – Gustavo Marinho Nov 05 '21 at 19:58
  • _I'm actually using this right now_: Nice to know. _self.wait_: minor adjustment, should be okay with you. _habit of declaring too much variables in def init()_: It isn't any issue. – undetected Selenium Nov 05 '21 at 20:01
0

I could retrieve the actual image URLs from another search engine DuckDuckGo using the following code:

search_query = 'what you want to find'
num_images = 1
driver_location = '/put/location/of/your/driver/here'

ser = Service(driver_location)
op = webdriver.ChromeOptions()
driver = webdriver.Chrome(service=ser, options=op)

# searching the query
driver.get(f'https://duckduckgo.com/?q={search_query}&kl=us-en&ia=web')

# going to Images Section
ba = driver.find_element(By.XPATH, "//a[@class='zcm__link  js-zci-link  js-zci-link--images']")
ba.click()

# getting the images URLs
for result in driver.find_elements(By.CSS_SELECTOR, '.js-images-link')[0:0+num_images]:
    imageURL = result.get_attribute('data-id')

    print(f'{imageURL}\n')

driver.quit()
Sayyor Y
  • 1,130
  • 2
  • 14
  • 27