Can't locate product information with xPath

Question

I'm writing my first web scraper in Python and I'm trying to get the product title and price from an Aliexpress product page. I am a total noob in this topic so sorry if this is an obvious question, but the solutions I've tried so far from other posts haven't worked. I'm using xpath to target html elements. I've copied the xpath code from Chrome with the inspect element -> copy xPath tool. It seems to not work the same way it worked on other websites, because the tree.xpath calls just keep returning empty lists. I managed to make it work for the title with trial and error, because it seems to return a list containing all the text on the entire page, and the title is on the third index of the list. I cannot find the index of the price though and also I would like to find the right way to do this. I've tried other people's solutions to similar problems, but nothing seems to work in my case and I am lost. Here is my code:

import requests
from lxml import html


url = 'https://www.aliexpress.com/item/4000203338045.html?spm=a2g0o.detail.1000060.1.77ce75e1YttKZb&gps-id=pcDetailBottomMoreThisSeller&scm=1007.13339.146401.0&scm_id=1007.13339.146401.0&scm-url=1007.13339.146401.0&pvid=662e2a50-e8d2-4ce3-b66e-70afff126070'

page = requests.get(url)
tree = html.fromstring(page.content)

title = tree.xpath('//*[@id="root"]/div/div[1]/div/div[2]/div[1]')[0]
title_text = title.xpath('///text()')[3]

print('Title:',title)
print('Title text:',title_text)

price = tree.xpath('//*[@id="root"]/div/div[2]/div/div[2]/div[4]/div[1]/span')
print('Price:', price)

And here is the output:

Title: <Element div at 0x3f113f0>
Title text: Bluedio T elf 2 Bluetooth earphone TWS wireless earbuds waterproof Sports Headset Wireless Earphone in ear with charging box-in Phone Earphones & Headphones from Consumer Electronics on AliExpress 
Price: []

I appreciate your help!

I think there is some sort of web crawler protection in effect. So it is detecting you aren't selecting a normal browser and returning no price. If you use Chrome against the link and look at the network tab, you will see there is this Request URL: https://364bf6cc.akstat.io/ It contains the &minPrice=18.27&maxPrice=18.27 that you are trying to obtain. I tried faking it with a User-Agent setting - but it was having none of it. — JGFMK, Feb 01 '20 at 20:02

score 0 · Answer 1 · answered Feb 02 '20 at 04:35

The xpath strings you're looking for are

tree.xpath('//div[@class="product-title"]/text()')
tree.xpath('//div[@class="product-price-current"]//text()')

However, requests doesn't process javascript (you'll need selenium or splash in front of scrapy). If you look at page.content you'll see that the words you're looking for are in the document, but inside some JSON.

"name":"PageModule",
"ogDescription":"Smarter Shopping, Better Living!  Aliexpress.com",
/* TITLE */
"ogTitle":"US $18.27 70% OFF|Bluedio T elf 2 Bluetooth earphone TWS wireless earbuds waterproof Sports Headset Wireless Earphone in ear with charging box-in Phone Earphones & Headphones from Consumer Electronics on AliExpress ",

"ogurl":"//www.aliexpress.com/item/4000203338045.html",
"oldItemDetailUrl":"https://www.aliexpress.com/item/Bluedio-T-elf-2-Bluetooth-earphone-TWS-wireless-earbuds-waterproof-Sports-Headset-Wireless-Earphone-in-ear/4000203338045.html",
"plazaElectronicSeller":false,
"productId":4000203338045,
"ruSelfOperation":false,
"showPlazaHeader":false,
"siteType":"glo",
"spanishPlaza":false,
"title":"Bluedio T elf 2 Bluetooth earphone TWS wireless earbuds waterproof Sports Headset Wireless Earphone in ear with charging box-in Phone Earphones & Headphones from Consumer Electronics on AliExpress "
},
"preSaleModule":{ 
   "features":{ 

   },
   "i18nMap":{ 

   },
   "id":0,
   "name":"PreSaleModule",
   "preSale":false
},
"priceModule":{ 
   "activity":true,
   "bigPreview":false,
   "bigSellProduct":false,
   "discount":70,
   "discountPromotion":true,
   "features":{ 

   },
   /* PRICE */
   "formatedActivityPrice":"US $18.27",

   "formatedPrice":"US $60.90",
   "hiddenBigSalePrice":false,
   "i18nMap":{ 
      "LOT":"lot",
      "INSTALLMENT":"Installment",
      "DEPOSIT":"Deposit",
      "PRE_ORDER_PRICE":"Pre-order price"
   }

Unfortunately, I recognize that this doesn't get you all the way to the answer you're looking for, but hopefully this will help get you on the way.

Can't locate product information with xPath

1 Answers1