0

I am trying to use selenium to scrape dynamic webpages. Here, I tried to print all the authors in the website

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://quotes.toscrape.com/js")
elements = driver.find_elements_by_class_name("author")
for i in elements:
    print(i.text)
driver.quit()

Which worked pretty well and printed me the right result:

Albert Einstein
J.K. Rowling
Albert Einstein
Jane Austen
Marilyn Monroe
Albert Einstein
André Gide
Thomas A. Edison
Eleanor Roosevelt
Steve Martin

But when I try to use a similar code for another website

I get an error:

selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: invalid locator
  (Session info: chrome=98.0.4758.102)

This is my second code:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
url = 'https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title'


driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)
elements = driver.find_elements_by_class_name("title  text-center")
for i in elements:
    print(i.text)
driver.quit()

What I am trying to do in this code is to print all the names of the perdumes in the webpage. After inspecting I saw that all of the names are in a class that called: 'title text-center'.

How can I fix my code?

Prophet
  • 32,350
  • 22
  • 54
  • 79
YahavB
  • 75
  • 6

2 Answers2

1

title text-center are actually 2 class names title and text-center.
In order to locate elements by 2 class names you have to use XPath or CSS Selector.
So, instead of

elements = driver.find_elements_by_class_name("title  text-center")

You can use

elements = driver.find_elements_by_xpath("//h3[@class='title  text-center']")

Or

elements = driver.find_elements_css_selector("h3.title.text-center")

Also, you should add waits to access the web elements only when they are loaded and ready.
This should be done with Expected Conditions explicit waits, as following:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title'


driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
wait = WebDriverWait(driver, 20)

driver.get(url)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h3.title.text-center")))
elements = driver.find_elements_css_selector("h3.title.text-center")
for i in elements:
    print(i.text)
driver.quit()
Prophet
  • 32,350
  • 22
  • 54
  • 79
  • But it only prints me the first 12 perfumes when it should print all the 139 perfumes – YahavB Feb 18 '22 at 16:17
  • To get more results you will have to scroll the page. The page you are working on is loading initially only the products it presents you. To get more results you will need to scroll the page down. – Prophet Feb 19 '22 at 17:17
  • how would you find the prices of the elements using the find_elements_by... command? – YahavB Feb 19 '22 at 21:19
  • This is a new question. We have answered your original question here. In case you have more questions - please ask a new, separate question for that – Prophet Feb 19 '22 at 21:50
  • posted a new question: https://stackoverflow.com/questions/71189645/selenium-cant-puul-the-products-prices-from-a-website – YahavB Feb 19 '22 at 22:12
1

This error message...

selenium.common.exceptions.InvalidArgumentException: Message: invalid argument: invalid locator

...implies that the locator strategy you have used is not a valid locator strategy as By.CLASS_NAME takes a single classname as an argument.


To print all the names of the perfumes in the webpage you can use List Comprehension you can use the following Locator Strategy:

  • Using css_selector:

    driver.get("https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title")
    print([my_elem.get_attribute("innerHTML") for my_elem in driver.find_elements_by_css_selector("h3.title")])
    

Ideally you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use the following Locator Strategy:

  • Using CSS_SELECTOR and get_attribute("innerHTML"):

    driver.get("https://www.myperfume.co.il/155567-%D7%9B%D7%9C-%D7%94%D7%9E%D7%95%D7%AA%D7%92%D7%99%D7%9D-%D7%9C%D7%92%D7%91%D7%A8?order=up_title")
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h3.title")))])
    
  • Console Output:

    [' 212 וי אי פי לגבר א.ד.ט 212 vip for men e.d.t ', ' 212 ניו יורק לגבר א.ד.ט 212 nyc for men e.d.t ', ' 212 סקסי לגבר א.ד.ט 212 sexy men e.d.t ', ' אברקרומבי פירס 100 מל א.ד.ק Abercrombie & Fitch Fierce 100 ml e.d.c ', ' אברקרומבי פירס 50 מל א.ד.ק Abercrombie & Fitch Fierce 50 ml e.d.c ', ' אברקרומבי פירס גודל ענק 200 מל א.ד.ק Abercrombie & Fitch Fierce 200 ml e.d.c ', ' אברקרומבי פירסט אינסטינקט לגבר א.ד.ט  Abercrombie & Fitch First Instinct e.d.t ', ' אגואיסט א.ד.ט Egoiste e.d.t ', ' אגואיסט פלטינום א.ד.ט Egoiste Platinum e.d.t ', ' או דה בלנק א.ד.ט Eau De Blanc e.d.t ', ' או דה פרש א.ד.ט Eau Fraiche e.d.t ', ' אובסיישן לגבר א.ד.ט Obsession for men e.d.t ']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352