Can't access certain sites requests.get in Python 3

Question

from requests import get
get('http://www.fb.com')
<Response [200]>
get('http://www.subscene.com')
<Response [403]

I'm trying to build a web scraper to scrape and download subtitles. But I'm unable to request any subtitle pages as they are returning a response code 403.

score 7 · Answer 1 · answered Dec 22 '16 at 22:42

HTTP Status Code 403 Forbidden means:

the server understood the request, but is refusing to fulfill it. Source

The server identified your script as a non-default browser (Chrome, Firefox, etc.) and is refusing to "speak" with it. It's very common to see sites doing this to avoid scrapers, exactly what you're trying to do...

A workaround is to set a user-agent in your headers, like so:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests

url = "http://www.subscene.com"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}

response = requests.get(url, headers=headers)
print(response)  # <Response [200]>

But I advise you to look for a site that provides some sort of API, relying on scraping isn't the best approach.

Then it is something on your end. Can you access the site directly using a web browser? — JChris, Dec 23 '16 at 08:24
This worked for me. I found funny using `agents` in order to fool... code! — AtilioA, Nov 03 '19 at 22:13

Can't access certain sites requests.get in Python 3

1 Answers1

Linked