I'm currently working on a scraper to analyze data and making charts of a website with Python 2.7, BeautifulSoup, Requests, Json, etc...
I want to make a search with definite keywords and then scrape the prices of the different items to make an average value.
So I tried BeautifulSoup to scrape the json response as I usually do but the response it gives me is:
{"data":{"uuid":"YNp-EuXHrw","index_name":"Listing","default_name":null,"query":"supreme box logo","filters":{"strata":["basic","grailed","hype"]}}}
My request goes to : https://www.grailed.com/api/searches , URL I've found on the index page when making a search.
I figured out that "uuid":"YNp-EuXHrw"
(always being a different value) is set to define the URL that will show the items data, as: https:// www.grailed.com/feed/YNp-EuXHrw
So I'm making a request to scrape the uuid from the api with
response = s.post(url, headers=headers, json=payload)
res_json = json.loads(response.text)
print response
id = res_json['data']['uuid']
But the problem is, when I'm making a request to
https:// www.grailed.com/ feed/YNp-EuXHrw
or whatever the uuid is, I'm getting <Response [500]>
.
My whole code is:
import BeautifulSoup,requests,re,string,time,datetime,sys,json
s = requests.session()
url = "https://www.grailed.com/api/searches"
payload = {
"index_name":"Listing_production","query":"supreme box logo sweatshirts","filters":{"strata":["grailed","hype","basic"],"category_paths":[],"sizes":[],"locations":[],"designers":[],"min_price":"null","max_price":"null"}
}
headers = {
"Host": "www.grailed.com",
"Connection":"keep-alive",
"Content-Length": "217",
"Origin": "null",
"x-api-version": "application/grailed.api.v1",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",
"content-type": "application/json",
"accept": "application/json",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4",
}
response = s.post(url, headers=headers, json=payload)
res_json = json.loads(response.text)
print response
id = res_json['data']['uuid']
urlID = "https://www.grailed.com/feed/" + str(id)
print urlID
response = s.get(urlID, headers=headers, json=res_json)
print response
As you can see when you're doing the requests through Chrome or whatever the URL quickly changes from
grailed. com
to
grailed.com/ feed/uuid
So I've tried to make a GET request to this URL but just getting Response 500.
What can I do to scrape data shown on the uuid URL as it don't even appears on Network requests?
I hope I was pretty clear, sorry for my english