0

I'm doing a tutorial and the task is to download pictures from "Google Images", using Python and Selenium but I have some problems.

import bs4
import requests
from selenium import webdriver
import os
import time

chromeDriverPath=r'C:\Users\Aorus\Downloads\Z_ARCHIWUM\PythonScript\chromedriver_win32\chromedriver.exe'
driver=webdriver.Chrome(chromeDriverPath)

search_URL = 'https://www.google.com/search?q=budynki&rlz=1C1GCEU_plPL919PL919&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiRyJvoo_L9AhWJxIsKHTIKDqwQ_AUoAXoECAEQAw&biw=1553&bih=724'

driver.get(search_URL)

a = input('Waiting for user input to start...')

# Scrolling all the way up
driver.execute_script('window.scrollTo(0, 0);')

page_html = driver.page_source
pageSoup = bs4.BeautifulSoup(page_html, 'html.parser')
containers = pageSoup.findAll('div', {'class':'isv-r PNCib MSM1fd BUooTd'})

len_containers = len(containers)
print('Found %s image containers'%(len_containers))

xPath1 = '//*[@id="islrg"]/div[1]/div[13]'


for i in range(1, len_containers+1):
    if i % 25 == 0:
        continue
    
    xPath2 = xPath1 + str(i)
    driver.find_element("xpath", xPath2).click()

and I got this error:

InvalidSelectorException: invalid selector: Unable to locate an element with the xpath expression //*[@id="islrg"]/div[1]/div[13]1 because of the following error:

SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//*[@id="islrg"]/div[1]/div[13]1' is not a valid XPath expression.

I chose a bad DIV or somewhere I should add str() or .text or the XPath is bad? When I choose a single picture to use .click(), it works.

JeffC
  • 22,180
  • 5
  • 32
  • 55
VitorWAW
  • 5
  • 2
  • This is not valid `xpath` expression. if you see the `xpath2` variable, it is coming `//*[@id="islrg"]/div[1]/div[13]1` , what are you trying achieve. I think are you looking for first node then if should be like `(//*[@id="islrg"]/div[1]/div[13])[1]` – KunduK Mar 23 '23 at 19:12

2 Answers2

0

This error message...

InvalidSelectorException: invalid selector: Unable to locate an element with the xpath expression //*[@id="islrg"]/div[1]/div[13]1 because of the following error:
SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//*[@id="islrg"]/div[1]/div[13]1' is not a valid XPath expression.

...implies that the locator strategy you have used is not a valid xpath expression.


This usecase

The block of code you have used:

xPath1 = '//*[@id="islrg"]/div[1]/div[13]'
for i in range(1, len_containers+1):
    if i % 25 == 0:
    continue
    xPath2 = xPath1 + str(i)
    driver.find_element("xpath", xPath2).click()
    

effectively results into xPath2 being evaluted as:

//*[@id="islrg"]/div[1]/div[13]1

which isn't a a valid xpath expression.


Solution

To convert xPath2 into a valid xpath your modified line of code will be:

xPath1 = '(//*[@id="islrg"]/div[1]/div[13])'
for i in range(1, len_containers+1):
    if i % 25 == 0:
    continue
    xPath2 = xPath1 + '(' +str(i)+ ')'
    driver.find_element("xpath", xPath2).click()
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

The error message shows exactly what went wrong.

The string '//*[@id="islrg"]/div[1]/div[13]1' is not a valid XPath expression.

You took an XPath

xPath1 = '//*[@id="islrg"]/div[1]/div[13]'

and then appended '1' to it in the line below (because i is 1)

xPath2 = xPath1 + str(i)

which becomes

'//*[@id="islrg"]/div[1]/div[13]' + '1'
'//*[@id="islrg"]/div[1]/div[13]1'

which is the exact string from the error message. The problem is that this is not a valid XPath... the final '1' at the end of the string makes it invalid.

After reviewing your entire script, I think there's a simpler way to approach this. Right now you've got BeautifulSoup in your script but it's not needed... you can get all of this using Selenium alone, simplifying everything.

One issue I ran into while writing this script is that the images take a moment to load. We can't use a standard WebDriverWait here because we don't know how many images are going to appear. So, we write a method that polls the page every 100ms to see if the count of images has gone up. We keep looping until the count is stable, meaning all the images have loaded.

def wait_for_images(locator)
    count = 0
    images = driver.find_elements(*locator)
    while len(images) != count:
        count = len(images)
        time.sleep(.1)
        images = driver.find_elements(*locator)

    return images

Now that we have the helper method, we can write the main script

chromeDriverPath = r'C:\Users\Aorus\Downloads\Z_ARCHIWUM\PythonScript\chromedriver_win32\chromedriver.exe'
driver = webdriver.Chrome(chromeDriverPath)

search_URL = 'https://www.google.com/search?q=budynki&rlz=1C1GCEU_plPL919PL919&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiRyJvoo_L9AhWJxIsKHTIKDqwQ_AUoAXoECAEQAw&biw=1553&bih=724'
driver.get(search_URL)

a = input('Waiting for user input to start...')

# Scrolling all the way up
driver.execute_script('window.scrollTo(0, 0);')

for image in wait_for_images((By.CSS_SELECTOR, ".bRMDJf.islir > img[src]")):
    print(image.get_attribute("src"))

This prints the URLs of each image that you can navigate to separately and download or whatever you need to do with them.

JeffC
  • 22,180
  • 5
  • 32
  • 55
  • I receive an error: InvalidArgumentException: invalid argument: 'using' must be a string What is "using" about? – VitorWAW Mar 24 '23 at 08:48
  • Which line are you getting that error? – JeffC Mar 24 '23 at 14:43
  • `File "C:\Users\...\webdriver.py", line 860, in find_elements return self.execute(Command.FIND_ELEMENTS, {"using": by, "value": value})["value"] or [] File "C:\Users\...\webdriver.pyy", line 440, in execute self.error_handler.check_response(response) File "C:\Users\...\webdriver.py", line 245, in check_response raise exception_class(message, screen, stacktrace) InvalidArgumentException: invalid argument: 'using' must be a string` – VitorWAW Mar 24 '23 at 15:40
  • I know it can be difficult to deduce anything from this. – VitorWAW Mar 24 '23 at 15:42
  • I've fixed a bug in `wait_for_images()`. Try the new code. I forgot to add a `*` when referencing the `locator` variable. – JeffC Mar 24 '23 at 15:46