Returning 403 Forbidden from simple get but loads okay in browser

Question

I'm trying to get some data from a page, but it's returning the error [403 Forbidden].

I thought it was the user agent, but I tried several user agents and it still returns the error.

I also tried to use the library fake user-agent but I did not succeed.

with requests.Session() as c:
        url = '...'
        #headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2224.3 Safari/537.36'}
        ua = UserAgent()
        header = {'User-Agent':str(ua.chrome)}
        page = c.get(url, headers=header)
        print page.content

When I access the page manually, everything works.

I'm using python 2.7.14 and requests library, Any idea?

Is the URL behind an authwall? From wiki, 403 is result of the web server being configured to deny access to the requested resource by the client. — Anshul Goyal, Mar 28 '18 at 19:51
@mu無 No, just access the url normally, do not need any authentication — Mathiasfc, Mar 28 '18 at 19:52
@emecas Truth, maybe that's the point, I'm investigating on the chrome console — Mathiasfc, Mar 28 '18 at 19:54
Possible duplicate of [Python requests. 403 Forbidden](https://stackoverflow.com/questions/38489386/python-requests-403-forbidden) — ivan_pozdeev, Mar 28 '18 at 19:56

ivan_pozdeev · Accepted Answer · 2020-06-05T20:44:09.523

The site could be using anything in the request to trigger the rejection.

So, copy all headers from the request that your browser makes. Then delete them one by one¹ to find out which are essential.

As per Python requests. 403 Forbidden, to add custom headers to the request, do:

result = requests.get(url, headers={'header':'value', <etc>})

¹_{A faster way would be to delete half of them each time instead but that's more complicated since there are probably multiple essential headers}

score 1 · Answer 2 · answered Mar 28 '18 at 20:25

These all headers I can see for a generic GET request that are included by the browser:

Host: <URL>
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Upgrade-Insecure-Requests: 1

Try to include all those incrementally in your request (1 by 1) in order to identify which one(s) is/are required for a successful request.

On the other hand, take look of the tabs: Cookies and/or Security available in your browser console / developer tools under Network option.

Returning 403 Forbidden from simple get but loads okay in browser

2 Answers2

Linked