0

I'm currently working on a scraper to analyze data and making charts of a website with Python 2.7, BeautifulSoup, Requests, Json, etc...

I want to make a search with definite keywords and then scrape the prices of the different items to make an average value.

So I tried BeautifulSoup to scrape the json response as I usually do but the response it gives me is:

{"data":{"uuid":"YNp-EuXHrw","index_name":"Listing","default_name":null,"query":"supreme box logo","filters":{"strata":["basic","grailed","hype"]}}}

My request goes to : https://www.grailed.com/api/searches , URL I've found on the index page when making a search.

I figured out that "uuid":"YNp-EuXHrw" (always being a different value) is set to define the URL that will show the items data, as: https:// www.grailed.com/feed/YNp-EuXHrw

So I'm making a request to scrape the uuid from the api with

response = s.post(url, headers=headers, json=payload)

res_json = json.loads(response.text)
print response
id = res_json['data']['uuid']

But the problem is, when I'm making a request to

https:// www.grailed.com/ feed/YNp-EuXHrw

or whatever the uuid is, I'm getting <Response [500]>.

My whole code is:

import BeautifulSoup,requests,re,string,time,datetime,sys,json

s = requests.session()

url = "https://www.grailed.com/api/searches"

payload = {
        "index_name":"Listing_production","query":"supreme box logo sweatshirts","filters":{"strata":["grailed","hype","basic"],"category_paths":[],"sizes":[],"locations":[],"designers":[],"min_price":"null","max_price":"null"}
   }


headers = {
    "Host": "www.grailed.com",
    "Connection":"keep-alive",
    "Content-Length": "217",
    "Origin": "null",
    "x-api-version": "application/grailed.api.v1",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",
    "content-type": "application/json",
    "accept": "application/json",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4",
    }

response = s.post(url, headers=headers, json=payload)

res_json = json.loads(response.text)
print response
id = res_json['data']['uuid']

urlID = "https://www.grailed.com/feed/" + str(id)
print urlID

response = s.get(urlID, headers=headers, json=res_json)
print response

As you can see when you're doing the requests through Chrome or whatever the URL quickly changes from

grailed. com

to

grailed.com/ feed/uuid

So I've tried to make a GET request to this URL but just getting Response 500.

What can I do to scrape data shown on the uuid URL as it don't even appears on Network requests?

I hope I was pretty clear, sorry for my english

Azerpas
  • 3
  • 2
  • It looks like the website is written with ReactJS. This question and answers can help https://stackoverflow.com/questions/29972996/how-to-parse-dom-react – oshaiken May 31 '17 at 15:48
  • @oshaiken thank you, will document on CasperJS – Azerpas May 31 '17 at 16:30
  • I have used PhantomJS and Selenium before – oshaiken May 31 '17 at 17:30
  • @oshaiken so what's the best? CasperJS or PhantomJS? I prefer to avoid Selenium for now, if it's my last option I will use it... Do you still have some piece of code? Tutorials on Google are pretty messy since it's a lot of Javascript... – Azerpas May 31 '17 at 20:03
  • No javascript is needed. – oshaiken May 31 '17 at 22:16

1 Answers1

0

Install phantomJs. http://phantomjs.org/ not a full solution, but hope this helps. pip install selenium npm install phantomjs

test.py

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.PhantomJS(executable_path='/usr/local/bin/phantomjs') //path to phantomjs driver
driver.set_window_size(1120, 550)

driver.get("https://www.grailed.com/")

try:
    //you want to wait untill page is renderded 
    element = WebDriverWait(driver,1).until(
        EC.presence_of_all_elements_located((By.XPATH,'//*[@id="homepage"]/div/div[3]/div[1]/div/form/label/input'))
    )
    element = driver.find_element_by_xpath('//*[@id="homepage"]/div/div[3]/div[1]/div/form/label/input')

    if element.is_displayed():
        element.send_keys('search this')
    else:
        print ('no element')

except Exception as e:
    print (e)





print (driver.current_url)
driver.quit()
oshaiken
  • 2,593
  • 1
  • 15
  • 25