11
import requests
import webbrowser
from bs4 import BeautifulSoup

url = 'https://www.gamefaqs.com'
#headers={'User-Agent': 'Mozilla/5.0'}    
headers ={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}


response = requests.get(url, headers)

response.status_code is returning 403. I can browse the website using firefox/chrome, so It seems to be a coding error.

I can't figure out what mistake I'm making.

Thank you.

cs95
  • 379,657
  • 97
  • 704
  • 746
Moondra
  • 4,399
  • 9
  • 46
  • 104

3 Answers3

8

This works if you make the request through a Session object.

import requests

session = requests.Session()
response = session.get('https://www.gamefaqs.com', headers={'User-Agent': 'Mozilla/5.0'})

print(response.status_code)

Output:

200
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Thanks. What exactly is going on with the `Session` object that is making the difference? I've never had to make a `Session` object to scrape a site. – Moondra Jul 13 '17 at 16:49
  • 1
    @Moondra The main thing about Session objects is its compatibility with cookies. For all you know, it's possible the site is setting and requesting cookies to be echoed back as a defence against scraping which is probably against its policy. – cs95 Jul 13 '17 at 16:51
  • Cookies. I see. Thank you. – Moondra Jul 13 '17 at 19:39
  • 6
    I've tried this for another website and it doesn't fix the issue, I still get a 403. – SarahJessica Sep 06 '20 at 14:59
  • Same here, I'd like to learn if you've found a solution? @SarahJessica – talha06 Apr 14 '21 at 19:37
  • It was a while ago, I can't remember @talha06. Sorry – SarahJessica Apr 14 '21 at 20:50
3

Using keyword argument works for me:

import requests
headers={'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.gamefaqs.com', headers=headers)
Stephen
  • 31
  • 3
1

Try using a Session.

import requests
session = requests.Session()
response = session.get(url, headers={'user-agent': 'Mozilla/5.0'})
print(response.status_code)

If still the request returns 403 Forbidden (after session object & adding user-agent to headers), you may need to add more headers:

headers = {
    'user-agent':"Mozilla/5.0 ...",
    'accept': '"text/html,application...',
    'referer': 'https://...',
}
r = session.get(url, headers=headers)

In the chrome, Request headers can be found in the Network > Headers > Request-Headers of the Developer Tools. (Press F12 to toggle it.)

reason being, few websites look for user-agent or for presence of specific headers before accepting the request.

Asrst
  • 159
  • 7