I want to get first 10 images url from google search using Selenium Python

Question

I want to get first 10 images url from google search (not base64). I have code:

import os
import base64
import time

from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

searchterm = 'bananas'  # will also be the name of the folder
url = "https://www.google.com/search?q=banan&source=lnms&tbm=isch&sa=X&ved=2ahUKEwj-75rDlJLoAhWLHHcKHStFC6EQ_AUoAXoECA4QAw&biw=1867&bih=951"
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized")
browser = webdriver.Chrome(executable_path=ChromeDriverManager().install(), options=options)
browser.get(url)
actions = webdriver.common.action_chains.ActionChains(browser)
header = {
    'User-Agent': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"}
counter = 0
succounter = 0

if not os.path.exists(searchterm):
    os.mkdir(searchterm)

for i in range(0, 11):
    time.sleep(1)
    x = browser.find_elements_by_xpath('//*[@id="islrg"]/descendant::img')[i]
    x.click()
    i += 1
    if i > 10:
        break
    ba = browser.find_element_by_xpath('//* 
    [@id="Sva75c"]/div/div/div[3]/div[2]/div/div[1]/div[1]/div/div[2]/a/img')
    print(ba.get_attribute('src'))

It returns image urls, but sometimes base64. How to make the script always return image url? Thank you.

You can try to check if the url string has base64 content using some answers here : https://stackoverflow.com/a/45928164/7964299 — Naveen, Mar 18 '20 at 11:08

score 1 · Answer 1 · answered Mar 18 '20 at 13:46

1

Change the xpath to get the link rather image, and then get the href.

ba = browser.find_element_by_xpath("//div[@class='islrc']//a[@href][@rel='noopener']")
print(ba.get_attribute("href")

answered Mar 18 '20 at 13:46

supputuri

13,644
2
21
39

This actually retrieves the URL of **the webpage** from where the Image was taken from, NOT the URL of the Image. – Sayyor Y Dec 21 '21 at 15:21

score 0 · Answer 2 · answered Dec 21 '21 at 15:20

You can always get only Image URLs if you scrape another search engine DuckDuckGo using the following code:

search_query = 'what you want to find'
num_images = 10
driver_location = '/put/location/of/your/driver/here'

# setting up the driver
ser = Service(driver_location)
op = webdriver.ChromeOptions()
driver = webdriver.Chrome(service=ser, options=op)

# searching the query
driver.get(f'https://duckduckgo.com/?q={search_query}&kl=us-en&ia=web')

# going to Images Section
ba = driver.find_element(By.XPATH, "//a[@class='zcm__link  js-zci-link  js-zci-link--images']")
ba.click()

# getting the images URLs
for result in driver.find_elements(By.CSS_SELECTOR, '.js-images-link')[0:0+num_images]:
    imageURL = result.get_attribute('data-id')

    print(f'{imageURL}\n')

driver.quit()

I want to get first 10 images url from google search using Selenium Python

2 Answers2