2

I want to pull the product name at this site: https://shopee.com.my/search?keyword=h370m I've received support of @DebanjanB at this question Selenium can not scrape Shopee e-commerce site using python but I am not able to apply the xpath of product name into that solution. Here is my code:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
options.add_argument('--disable-extensions')
browserdriver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Users\\admin\\Desktop\\chromedriver_win32\\Chromedriver')
browserdriver.get('https://shopee.com.my/search?keyword=h370m')
WebDriverWait(browserdriver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='shopee-modal__container']//button[text()='English']"))).click()
print([my_element.text for my_element in WebDriverWait(browserdriver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, ".//*[@class='_1JAmkB']")))])
print("Program Ended")

Also, I tried different xpath, such as:

By.XPATH, ".//*[@class='_1JAmkB']/child::div"

or

//div[contains(concat(' ', normalize-space(@class), ' '), ' _1NoI8_ ')]

Neither of them can give me the result as expected

The output I received was just:

['', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] Program Ended

Please help me to solve this problem. Thanks!

QHarr
  • 83,427
  • 12
  • 54
  • 101
Huynh
  • 392
  • 5
  • 16

1 Answers1

2

XPath:

You can use this xpath and also you need the innerHTML (not .text)

//*[@class="_1NoI8_ _2gr36I"]

And then extract the innerHTML.

print([my_element.get_attribute('innerHTML') for my_element in WebDriverWait(browserdriver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//*[@class="_1NoI8_ _2gr36I"]')))])

CSS:

print([my_element.get_attribute('innerHTML') for my_element in WebDriverWait(browserdriver, 10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "._1NoI8_._2gr36I")))])

API:

I still think the API is better. I showed using that here. I get the names and prices each time so unsure about the issue over time you had (though I don't know how many times you have run it). With the API you don't need to scroll to generate all results.


With a short wait you can extract all data also from script tags on page:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
import json

browserdriver = webdriver.Chrome()
browserdriver.get('https://shopee.com.my/search?keyword=h370m')
WebDriverWait(browserdriver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='shopee-modal__container']//button[text()='English']"))).click()
time.sleep(2)
products = [item for item in WebDriverWait(browserdriver, 10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '[type="application/ld+json"]')))]
products_json = [product.get_attribute('innerHTML') for product in products[1:]]
names = [json.loads(product)['name'] for product in products_json]  #just showing name extraction from json
len(names)
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Thank you so much! That works perfectly! I love to use API but I get trouble when integrating loop into request API in order to scan all product info of any shop, which often has multiple pages. In my work, I have to run loop for many times but it just works for the first time running of each day (as comments at your solution of previous question). That appears incomfortable to me. I am trying to solve that and hope to receive your further support for that loop problem. Do you feel free if I open another question for that problem? I just learn coding to support my job so I ask so much! – Huynh Apr 26 '19 at 04:39
  • Open a new question as there are lots of people who will then see it. It is good to ask questions (provided you put in some effort yourself and write clear questions - both of which you do!) – QHarr Apr 26 '19 at 04:42