3

I am not able to pull the price of products on Shopee (a e-commercial site).
I have taken a look at the problem solved by @dmitrybelyakov (link: Scraping AJAX e-commerce site using python) .

That solution helped me to get the 'name' of product and the 'historical_sold' but I can not get the price of the product. I can not find the price value in the Json string. Therefore, I tried to use selenium to pull data with xpath but it appeared to be failed.

The link of the ecommercial site: https://shopee.com.my/search?keyword=h370m

My code:

import time

from selenium import webdriver

import pandas as pd

path = r'C:\Users\\admin\\Desktop\\chromedriver_win32\\Chromedriver'

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('headless')
chrome_options.add_argument('window-size=1200x600')

browserdriver = webdriver.Chrome(executable_path = path,options=chrome_options)
link='https://shopee.com.my/search?keyword=h370m'
browserdriver.get(link)
productprice='//*[@id="main"]/div/div[2]/div[2]/div/div/div/div[2]/div/div/div[2]/div[1]/div/a/div/div[2]/div[1]'
productprice_printout=browserdriver.find_element_by_xpath(productname).text
print(productprice_printout)

When I run that code, it showed the error notification like this:

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="main"]/div/div[2]/div[2]/div/div/div/div[2]/div/div/div[2]/div[1]/div/a/div/div[2]/div[1]"}

Please help me to get the price of product on Shopee!

QHarr
  • 83,427
  • 12
  • 54
  • 101
Huynh
  • 392
  • 5
  • 16

3 Answers3

3

You can use requests and the search API for the site

import requests

headers = {
    'User-Agent': 'Mozilla/5',
    'Referer': 'https://shopee.com.my/search?keyword=h370m'
}

url = 'https://shopee.com.my/api/v2/search_items/?by=relevancy&keyword=h370m&limit=50&newest=0&order=desc&page_type=search'  
r = requests.get(url, headers = headers).json()

for item in r['items']:
    print(item['name'], ' ', item['price'])

If you want roughly the same scale:

for item in r['items']:
    print(item['name'], ' ', 'RM' + str(item['price']/100000))
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Thank you for your support! That's very useful for me. – Huynh Apr 22 '19 at 04:45
  • Your code gives me exactly what I want. However, when I use your code without the line " 'Referer': 'https://shopee.com.my/search?keyword=h370m'", it would be unable to give the value of price. Could you explain more for me to understand about the difference between having and not having Referer in this case? – Huynh Apr 22 '19 at 05:05
  • The server expects certain headers to be sent from the client. In this case it expects the Referer header (the page which would have been making the request for this info). API is faster and more reliable than using a browser and should be your first choice where available. – QHarr Apr 22 '19 at 05:44
  • I've applied your guide in trying to get name and price of all products of one specific shop. 'import requests i=1 while i<20: headers = { 'User-Agent': 'Mozilla/5', 'Referer': 'https://shopee.vn/shop/42575106/search?page='+str(i)+'&sortBy=pop' } url = 'https://shopee.com.my/api/v2/search_items/?by=pop&limit=30&match_id=42575106&newest='+str(30*i)+'&order=desc&page_type=shop' r = requests.get(url, headers = headers).json() for item in r['items']: print(item['name'], ' ', item['price']) i=i+1' – Huynh Apr 24 '19 at 14:40
  • It just worked for the first run time but the following attempts of running failed to give me the price, just give the name. The output of those attempts are like this: Cooler Master Hyper 212 Turbo Black CPU Cooler (RR-212TK-16P None HIGH QUALITY 3 PIN UK TO IEC C5 NOTEBOOK POWER CABLE WITH FUSE None. Could you explain me why it just work for first time when I apply the loop into your solution to scan all products of such shop? – Huynh Apr 24 '19 at 14:43
1

To extract the price of products on Shopee using Selenium and Python you can use the following solution:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions()
    options.add_argument('--headless')
    options.add_argument('start-maximized')
    options.add_argument('disable-infobars')
    options.add_argument('--disable-extensions')
    browserdriver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    browserdriver.get('https://shopee.com.my/search?keyword=h370m')
    WebDriverWait(browserdriver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='shopee-modal__container']//button[text()='English']"))).click()
    print([my_element.text for my_element in WebDriverWait(browserdriver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[text()='RM']//following::span[1]")))])
    print("Program Ended")
    
  • Console Output:

    ['430.00', '385.00', '435.00', '409.00', '479.00', '439.00', '479.00', '439.00', '439.00', '403.20', '369.00', '420.00', '479.00', '465.00', '465.00']
    Program Ended
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • yesterday, your code run smoothly but today I test again and the output comes out like this: ' at 0x000002660D40D840> Program Ended'. Please help me to correct it! Thanks. – Huynh Apr 24 '19 at 13:51
  • @Huynh This solution should work flawless. Please try to initiate the _Test Execution_ in a clean _Test Environment_ with appropriate and matching _WebDriver_ i.e. _ChromeDriver_ and _Web Client_ i.e. _Google Chrome_ versions. FYI, we have released _Chrome v74.x_ so crosscheck if _Chrome_ got updated. – undetected Selenium Apr 24 '19 at 13:57
  • `(Session info: headless chrome=74.0.3729.108) (Driver info: chromedriver=74.0.3729.6)` That is my current Chromedriver and Google Chrome version. Unfortunately, it does not work as expected. Just some days ago, it worked well but not now. I do not know what is happening! – Huynh Apr 25 '19 at 07:45
  • @Huynh We have pushed `chrome=74.0` just yesterday and have started using `chromedriver=74.0` from today. Too early to comment. I'm sure `chrome=73.0` and `chromedriver=73.0` would work just perfecto. – undetected Selenium Apr 25 '19 at 07:50
  • 1
    Thank you for your prompt response! Finally, I find that your original code is working well but the last version of your code (edited by Corey Goldberg) is not. Sorry for this inconvinience! – Huynh Apr 25 '19 at 11:09
  • @Huynh Rolled back the changes so the answer caters to your question and would be helpful to the future readers. – undetected Selenium Apr 25 '19 at 11:12
  • Could I get the name of product with your solution? I've tried to use `By.XPATH, ".//*[@class='_1JAmkB']"` or `By.XPATH, ".//*[@class='_1JAmkB']/child::div"` but the output was just `['', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] Program Ended` Could you help me here or I have to open another question? thanks! – Huynh Apr 25 '19 at 12:43
  • @Huynh Please raise a new question with your new requirement. Stackoverflow contributors will be happy to help you out. – undetected Selenium Apr 25 '19 at 12:51
0

When visiting the website. I come across this popup https://gyazo.com/0a9cd82e2c9879a1c834a82cb15020bd. I am guessing, why selenium cannot detect the xpath you are looking for, is because this popup is blocking the element.

right after starting the selenium session, try this:

popup=browserdriver.find_element_by_xpath('//*[@id="modal"]/div[1]/div[1]/div/div[3]/button[1]')
popup.click()
Jake Strouse
  • 58
  • 1
  • 10