1

Current problem: The code below runs fine until I insert the following code to click an arrow on the product square/profile.

Bigger problem: The code as a whole runs fine, but the dataset is distorted. After some experimenting, I discovered that the distorted data is all located "below the fold." I'm trying to click on the error on each product square/profile in order to expose the otherwise hidden data. I believe if I can do this, the scraper should work and the dataset will no longer be distorted.

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

data = []

for y in range(1,3):
    website = f'https://www.knowde.com/b/markets-personal-care/products{y}'
    path = '/Users/kdavid3mbp/Python/chrome_driver64/chromedriver'
    driver = webdriver.Chrome(path)
    driver.get(website)
    
    for x in range(1,37):
        products = driver.find_elements('xpath', f'//*[@id="__next"]/main/div/div[3]/div[3]/div[1]/div[2]/div[{x}]')
       
        for product in products:
            WebDriverWait(driver, 10).until(EC.element_to_be_clickable(('xpath', '/div/div/svg')).click()
            
            brand = product.find_element('xpath', './a/div[2]/div/p[1]').text
            item = product.find_element('xpath', './a/div[2]/div/p[2]').text
            inci_name = product.find_element('xpath', './a/div[2]/div/div[1]/span[2]').text
            try:
                ingredient_origin = product.find_element('xpath', './a/div[2]/div/div[3]/span[2]').text
            except NoSuchElementException:
                ingredient_origin = 'null'
            try:
                function = product.find_element('xpath', './a/div[2]/div/div[2]/span[2]').text
            except NoSuchElementException:
                function = 'null'
            try:
                benefit_claims = product.find_element('xpath', './a/div[2]/div/div[4]/span[2]').text
            except NoSuchElementException:
                benefit_claims = 'null'
            try:
                description = product.find_element('xpath', './a/div[2]/div/p[3]').text
            except NoSuchElementException:
                description = 'null'
            try:
                labeling_claims = product.find_element('xpath', './a/div[2]/div/div[5]/span[2]').text
            except NoSuchElementException:
                labeling_claims = 'null'
            try:
                compliance = product.find_element('xpath', './a/div[2]/div/div[6]/span[2]').text
            except NoSuchElementException:
                compliance = 'null'
            try:
                hlb_value = product.find_element('xpath', './a/div[2]/div/div[4]/span[2]').text
            except NoSuchElementException:
                hlb_value = 'null'
            try:
                end_uses = product.find_element('xpath', '/a/div[2]/div/div[4]/span[2]').text
            except NoSuchElementException:
                end_uses = 'null'
            try:
                cas_no = product.find_element('xpath', './a/div[2]/div/div[5]/span[2]').text
            except NoSuchElementException:
                cas_no = 'null'
            try:
                chemical_name = product.find_element('xpath', './a/div[2]/div/div[2]/span[2]').text
            except NoSuchElementException:
                chemical_name = 'null'
            try:
                synonyms = product.find_element('xpath', './a/div[2]/div/div[6]/span[2]').text
            except NoSuchElementException:
                synonyms = 'null'
            try:
                chemical_family = product.find_element('xpath', './a/div[2]/div/div[5]/span[2]').text
            except NoSuchElementException:
                chemical_family = 'null'
            try:
                features = product.find_element('xpath', './a/div[2]/div/div[7]/span[2]').text
            except NoSuchElementException:
                features = 'null'
            try:
                grade = product.find_element('xpath', './a/div[2]/div/div[5]/span[2]').text
            except NoSuchElementException:
                grade = 'null'

        dict = {
            'brand': brand,
            'item': item,
            'inci_name': inci_name,
            'ingredient_origin': ingredient_origin,
            'function': function,
            'benefit_claims': benefit_claims,
            'description': description,
            'labeling_claims': labeling_claims,
            'compliance': compliance,
            'hlb_value': hlb_value,
            'end_uses': end_uses,
            'cas_no': cas_no,
            'chemical_name': chemical_name,
            'synonyms': synonyms,
            'chemical_family': chemical_family,
            'features': features,
            'grade': grade
        }

        data.append(dict)
        print('Saving: ', dict['brand'])


# Closes driver once for loop is completed
driver.quit()

df = pd.DataFrame(data)
df.to_csv('/Users/kdavid3mbp/Python/cosmetics_data.csv', index=False)

The current problem is when inserting the following:

WebDriverWait(driver, 10).until(EC.element_to_be_clickable(('xpath', '/div/div/svg')).click()

I get a SyntaxError:

  File "/var/folders/90/82_f843n4h9drvxh7z3tqg840000gn/T/ipykernel_34523/1099992952.py", line 22
    brand = product.find_element('xpath', './a/div[2]/div/p[1]').text
    ^
SyntaxError: invalid syntax

I'm not sure how to arrange this so that I click the down arrows. I'd like to click each one for the 36 product squares/profiles on each page.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
kdavid3891
  • 71
  • 4
  • The number of opening and closing parentheses in `WebDriverWait(driver, 10).until(EC.element_to_be_clickable(('xpath', '/div/div/svg')).click()` don't match. – Matthias Jun 13 '23 at 11:28

1 Answers1

1

According to the definition, element_to_be_clickable() should be called within a tuple as it is not a function but a class, where the initializer expects just 1 argument beyond the implicit self:

class element_to_be_clickable(object):
    """ An Expectation for checking an element is visible and enabled such that you can click it."""
    def __init__(self, locator):
        self.locator = locator

    def __call__(self, driver):
        element = visibility_of_element_located(self.locator)(driver)
        if element and element.is_enabled():
            return element
        else:
            return False

So instead of:

WebDriverWait(driver, 10).until(EC.element_to_be_clickable(('xpath', '/div/div/svg')).click()

You need to (add an extra parentheses):

WebDriverWait(driver, 10).until(EC.element_to_be_clickable(('xpath', '/div/div/svg'))).click()
                                            # note the additional end of parenthesis ^  

References

You can find a couple of relevant detailed discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Thank you for your input! I've corrected the parentheses issue. I now receive a "TimeoutException" error. – kdavid3891 Jun 13 '23 at 16:07
  • @kdavid3891 Glad to be able to solve the _`SyntaxError: invalid syntax`_ issue. _`TimeoutException`_ is a different issue all together. Feel free raise a new ticket along with the relevant text based HTML of the element. – undetected Selenium Jun 13 '23 at 18:29