0

I can not figure out how to scrape it, seems like the info is being hidden by Ng-show and after many attempts, nothing I found seems to work.

Website: https://cjdropshipping.com/product/silicone-grip-device-finger-exercise-stretcher-finger-gripper-strength-trainer-strengthen-rehabilitation-training-p-1614453269613522944.html?from=HTP

I want to scrape the product description and the shipping time

This is my current code:

from selenium import webdriver
from selenium.webdriver.common.by import By


# Set up the Chrome driver
driver = webdriver.Chrome()

# Navigate to the website
driver.get("https://cjdropshipping.com/product/silicone-grip-device-finger-exercise-stretcher-finger-gripper-strength-trainer-strengthen-rehabilitation-training-p-1614453269613522944.html?from=HTP")

# Find the element that contains the title of the product
title_element = driver.find_element(By.CSS_SELECTOR, 'div > div > div > div > div > div > pro-detail > div').get_attribute("textContent")
print(title_element)
# Extract the text from the element
title = title_element.text

# Print the title
print(title)

# Close the driver
driver.quit()
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
DamonW
  • 11

2 Answers2

0

You need to wait for a few seconds for the target web elements or the contents on the page to load before you can find them.

[update] And You also need to scroll down up to the height of the description section to load the description information.

Here is the updated solution:

from time import sleep
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()

driver.get("https://cjdropshipping.com/product/silicone-grip-device-finger-exercise-stretcher-finger-gripper-strength-trainer-strengthen-rehabilitation-training-p-1614453269613522944.html?from=HTP")
WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.ID, "pd-merchName")))

# scroll down in steps by window height 1000 to load the description
driver.execute_script("window.scrollBy(0, 1000);")
sleep(2)

soup = BeautifulSoup(driver.page_source, 'lxml')
title_element = soup.find('div', attrs={"id": "pd-merchName"}).text.strip()
print(title_element)

description1 = soup.find('div', attrs={"class": "pd-new-desc info-box"}).text.strip()
description2 = [i.text for i in soup.find('div', attrs={"id": "pd-description"}).find_all('p')]

print(description1)
print(description2)
Ajeet Verma
  • 2,938
  • 3
  • 13
  • 24
  • This works, but is there any way to get the product description for the same product, I tried your way, but it doesn't seem to find anything. Thanks – DamonW Mar 26 '23 at 04:56
  • @DamonW, I've updated the code above which gets you the product description as well. – Ajeet Verma Mar 26 '23 at 05:31
0

To extract the Product Info ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR and text attribute:

    driver.get('https://cjdropshipping.com/product/silicone-grip-device-finger-exercise-stretcher-finger-gripper-strength-trainer-strengthen-rehabilitation-training-p-1614453269613522944.html?from=HTP')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div#subscribe-box > img"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div#pd-merchName > div"))).text)
    
  • Using XPATH and get_attribute("innerHTML"):

    driver.get('https://cjdropshipping.com/product/silicone-grip-device-finger-exercise-stretcher-finger-gripper-strength-trainer-strengthen-rehabilitation-training-p-1614453269613522944.html?from=HTP')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div#subscribe-box > img"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@id='pd-merchName']/div"))).get_attribute("innerHTML").strip())
    
  • Console Output:

    Silicone Grip Device Finger Exercise Stretcher Finger Gripper Strength Trainer Strengthen Rehabilitation Training
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python


References

Link to useful documentation:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352