API - Web Scrape

Question

how to get access to this API:

import requests
    url = 'https://b2c-api-premiumlabel-production.azurewebsites.net/api/b2c/page/menu?id_loja=2691'
    print(requests.get(url))

I'm trying to retrieve data from this site via API, I found the url above and I can see its data , however I can't seem to get it right because I'm running into code 403. This is the website url: https://www.nagumo.com.br/osasco-lj46-osasco-ayrosa-rua-avestruz/departamentos

I'm trying to retrieve items category, they are visible for me, but I'm unable to take them. Later I'll use these categories to iterate over products API.

API Category

Obs: please be gentle it's my first post here =]

What are you trying to get from that website? Is there a product(s)? — QHarr, Jun 15 '22 at 04:36
@QHarr just edited my post, please check the image "API Category", which get product's category data, that I can use to iterate over products later. The call to access this endpoint with category will be the same to access the products list, because their request headers are the same or at least very similar. — data_creator, Jun 15 '22 at 20:57

score 1 · Accepted Answer · answered Jun 15 '22 at 21:20

1

To get the data as you shown in your image the following headers and endpoint are needed:

import requests

headers = {   
    'sm-token': '{"IdLoja":2691,"IdRede":884}', 
    'User-Agent': 'Mozilla/5.0',
   'Referer': 'https://www.nagumo.com.br/osasco-lj46-osasco-ayrosa-rua-avestruz/departamentos',
}

params = {
    'id_loja': '2691',
}

r = requests.get('https://www.nagumo.com.br/api/b2c/page/menu', params=params, headers=headers)
r.json()

answered Jun 15 '22 at 21:20

QHarr

83,427
12
54
101

Thank you very much!! I tried out some headers by my-self, but I didn't succeed. How did you manage to find out the right request header, the ones that gets in? I'm just asking to know if you tried out one by one manually, or you have some kind of trick to find the right ones. – data_creator Jun 15 '22 at 22:42
I removed ones that from experience I knew were unlikely to be needed. Then commented out others one by one. When I had the needed set I then tested removing parameters within headers as well. You could also use a tool like Postman, WireShark or Insomnia. – QHarr Jun 16 '22 at 02:13

score 0 · Answer 2 · answered Jun 15 '22 at 01:18

0

Not sure exactly what your issue is here. Bu if you want to see the content of the response and not just the 200/400 reponses. You need to add '.content' to your print.

Eg.

#Create Session
s = requests.Session()

#Example Connection Variables, probably not required for your use case.
setCookieUrl = 'https://www...'
HeadersJson = {'Accept-Language':'en-us'}
bodyJson = {"__type":"xxx","applicationName":"xxx","userID":"User01","password":"password2021"}


#Get Request
p = s.get(otherUrl, json=otherBodyJson, headers=otherHeadersJson)
print(p) #Print response (200 etc)
#print(p.headers)
#print(p.content) #Print the content of the response.
#print(s.cookies)

answered Jun 15 '22 at 01:18

max_settings

59
9

this is the problem the code is 403 – data_creator Jun 15 '22 at 01:36
I see. If you check the response headers/content etc can you see why it is responding that way? Usually it will say something like 'malformed header' or something like that. Then you will need to find out why. – max_settings Jun 15 '22 at 01:38
That is why I put in the 'setCookieUrl' etc. Each site you interact with may handle requests differently, and expect certain headeres/cookies etc to be set when making a request. Or it may respond with a Cookie which you will need for future requests. It can be a pain, but if you persist you will get there. – max_settings Jun 15 '22 at 01:40
he HTTP 403 is an HTTP status code meaning access to the requested resource is forbidden. The server understood the request, but will not fulfill it. This means your request is malformed in some way. – max_settings Jun 15 '22 at 01:43

score 0 · Answer 3 · answered Jun 15 '22 at 01:35

I'm also new here haha, but besides this requests library, you'll also need another one like beautiful soup for what you're trying to do.

bs4 installation: https:https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-beautiful-soup

Once you install it and import it, it's just continuing what you were doing to actively get your data.

response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

this gets the entire HTML content of the page, and so, you can get your data from this page based on their css selectors like this:

site_data = soup.select('selector')

site_data is an array of things with that 'selector', so a simple for loop and an array to add your items in would suffice (as an example, getting links for each book on a bookstore site)

For example, if i was trying to get links from a site:

import requests
from bs4 import BeautifulSoup

sites = []
URL = 'https://b2c-api-premiumlabel-production.azurewebsites.net/api/b2c/page/menu?id_loja=2691'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

links = soup.select("a")  # list of all items with this selector

for link in links:
   sites.append(link)

Also, a helpful tip is when you inspect the page (right click and at the bottom press 'inspect'), you can see the code for the page. Go to the HTML and find the data you want and right click it and select copy -> copy selector. This will make it really easy for you to get the data you want on that site.

helpful sites: https://oxylabs.io/blog/python-web-scraping https://realpython.com/beautiful-soup-web-scraper-python/

O thanks, but still showing 403 error: print(soup) terminal:
Error 403 - Forbidden — data_creator, Jun 15 '22 at 01:40
ah i thought you meant something else, that's strange. i tried the requests code and it managed to work for me. your error means that its rejecting your get request. Try visiting this, i think it'll help: https://stackoverflow.com/questions/38489386/python-requests-403-forbidden — kayak, Jun 15 '22 at 01:45

API - Web Scrape

3 Answers3

Error 403 - Forbidden