Python requests through API with variable URL in json to scrape content

Question

I'm currently working on a scraper to analyze data and making charts of a website with Python 2.7, BeautifulSoup, Requests, Json, etc...

I want to make a search with definite keywords and then scrape the prices of the different items to make an average value.

So I tried BeautifulSoup to scrape the json response as I usually do but the response it gives me is:

{"data":{"uuid":"YNp-EuXHrw","index_name":"Listing","default_name":null,"query":"supreme box logo","filters":{"strata":["basic","grailed","hype"]}}}

My request goes to : https://www.grailed.com/api/searches , URL I've found on the index page when making a search.

I figured out that "uuid":"YNp-EuXHrw" (always being a different value) is set to define the URL that will show the items data, as: https:// www.grailed.com/feed/YNp-EuXHrw

So I'm making a request to scrape the uuid from the api with

response = s.post(url, headers=headers, json=payload)

res_json = json.loads(response.text)
print response
id = res_json['data']['uuid']

But the problem is, when I'm making a request to

https:// www.grailed.com/ feed/YNp-EuXHrw

or whatever the uuid is, I'm getting <Response [500]>.

My whole code is:

import BeautifulSoup,requests,re,string,time,datetime,sys,json

s = requests.session()

url = "https://www.grailed.com/api/searches"

payload = {
        "index_name":"Listing_production","query":"supreme box logo sweatshirts","filters":{"strata":["grailed","hype","basic"],"category_paths":[],"sizes":[],"locations":[],"designers":[],"min_price":"null","max_price":"null"}
   }


headers = {
    "Host": "www.grailed.com",
    "Connection":"keep-alive",
    "Content-Length": "217",
    "Origin": "null",
    "x-api-version": "application/grailed.api.v1",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",
    "content-type": "application/json",
    "accept": "application/json",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4",
    }

response = s.post(url, headers=headers, json=payload)

res_json = json.loads(response.text)
print response
id = res_json['data']['uuid']

urlID = "https://www.grailed.com/feed/" + str(id)
print urlID

response = s.get(urlID, headers=headers, json=res_json)
print response

As you can see when you're doing the requests through Chrome or whatever the URL quickly changes from

grailed. com

to

grailed.com/ feed/uuid

So I've tried to make a GET request to this URL but just getting Response 500.

What can I do to scrape data shown on the uuid URL as it don't even appears on Network requests?

I hope I was pretty clear, sorry for my english

It looks like the website is written with ReactJS. This question and answers can help https://stackoverflow.com/questions/29972996/how-to-parse-dom-react — oshaiken, May 31 '17 at 15:48
@oshaiken so what's the best? CasperJS or PhantomJS? I prefer to avoid Selenium for now, if it's my last option I will use it... Do you still have some piece of code? Tutorials on Google are pretty messy since it's a lot of Javascript... — Azerpas, May 31 '17 at 20:03

oshaiken · Accepted Answer · 2017-05-31T22:16:37.247

Install phantomJs. http://phantomjs.org/ not a full solution, but hope this helps. pip install selenium npm install phantomjs

test.py

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.PhantomJS(executable_path='/usr/local/bin/phantomjs') //path to phantomjs driver
driver.set_window_size(1120, 550)

driver.get("https://www.grailed.com/")

try:
    //you want to wait untill page is renderded 
    element = WebDriverWait(driver,1).until(
        EC.presence_of_all_elements_located((By.XPATH,'//*[@id="homepage"]/div/div[3]/div[1]/div/form/label/input'))
    )
    element = driver.find_element_by_xpath('//*[@id="homepage"]/div/div[3]/div[1]/div/form/label/input')

    if element.is_displayed():
        element.send_keys('search this')
    else:
        print ('no element')

except Exception as e:
    print (e)





print (driver.current_url)
driver.quit()

Python requests through API with variable URL in json to scrape content

1 Answers1

test.py