4

I'm trying to get some data from a page, but it's returning the error [403 Forbidden].

I thought it was the user agent, but I tried several user agents and it still returns the error.

I also tried to use the library fake user-agent but I did not succeed.

with requests.Session() as c:
        url = '...'
        #headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2224.3 Safari/537.36'}
        ua = UserAgent()
        header = {'User-Agent':str(ua.chrome)}
        page = c.get(url, headers=header)
        print page.content

When I access the page manually, everything works.

I'm using python 2.7.14 and requests library, Any idea?

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
Mathiasfc
  • 1,627
  • 1
  • 16
  • 24

2 Answers2

4

The site could be using anything in the request to trigger the rejection.

So, copy all headers from the request that your browser makes. Then delete them one by one1 to find out which are essential.

As per Python requests. 403 Forbidden, to add custom headers to the request, do:

result = requests.get(url, headers={'header':'value', <etc>})

1A faster way would be to delete half of them each time instead but that's more complicated since there are probably multiple essential headers

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
1

These all headers I can see for a generic GET request that are included by the browser:

Host: <URL>
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Upgrade-Insecure-Requests: 1

Try to include all those incrementally in your request (1 by 1) in order to identify which one(s) is/are required for a successful request.

On the other hand, take look of the tabs: Cookies and/or Security available in your browser console / developer tools under Network option.

emecas
  • 1,586
  • 3
  • 27
  • 43