1

I am trying to extract the data from the website https://shop.nordstrom.com/ for all the products (like shirt, t-shirt and so on). The page is dynamically loaded. I know I can use selenium with headless browser, but that is also a time consuming process and looking up on the elements, having strange ID and class names, that is also not so promising.

So I thought of looking up on the Network tool, if I can find the path to the API, from where the data is being loaded (XHR Request) . But I could not find any thing helpful. So is there a way to get the data from the website ?

kunal
  • 365
  • 1
  • 4
  • 16

1 Answers1

2

If you don't want to use selenium then the alternative is to use a web parser like bs4 or use simply the request module.

You are on the right path in finding the call to the API. XHR requests can be seen under the network tab but the multitude of resources that appears makes it intricate to understand the requests being made. A simple way around this is to use the following method:

Instead of Network tab go to the console tab. There click on the settings icon, and then tick just the option Log XMLHTTPRequests.

Now refresh the page and scroll down to initiate dynamic calls. You will now be able to see the logs of all XHR in a more clear way.

For example

(index):29 Fetch finished loading: GET "**https://shop.nordstrom.com/api/recs?page_type=home&placement=HP_SALE%2CHP_TOP_RECS%2CHP_CUST_HIS%2CHP_AFF_BRAND%2CHP_FTR&channel=web&bound=24%2C24%2C24%2C24%2C6&apikey=9df15975b8cb98f775942f3b0d614157&session_id=0&shopper_id=df0fdb2bb2cf4965a344452cb42ce560&country_code=US&experiment_id=945b2363-c75d-4950-b255-194803a3ee2a&category_id=2375500&style_id=0%2C0%2C0%2C0&ts=1593768329863&url=https%3A%2F%2Fshop.nordstrom.com%2F&zip_code=null**".

Making a get request to that URL gives a bunch of Json objects. You can now use this url and others that you can derive to make the request straight to the URL.

See the answer here on how you can integrate the url with a request module to fetch data.

AzyCrw4282
  • 7,222
  • 5
  • 19
  • 35
  • 1
    Extremely sorry for the late reply. Yes your solution really helped me a lot (actually the API path you provided.). But I still have a problem. I could not find the API Path (that you provided) in the console tab. I really don't know why ? Also, actually I was looking up for a specific product (e.g t-shirt). I want that data to be returned in JSON format (it's API path). But I cannot find the path. If you could please find the API path to the search query, t-shirt, it would really be helpful. – kunal Jul 04 '20 at 15:26
  • 1
    EDIT: Actually I tried using the VPN, and was able to get the desired results I wanted. Also Thank you @AzyCrw4282. You really helped me solve the problem. – kunal Jul 04 '20 at 18:17