0
from requests import get
get('http://www.fb.com')
<Response [200]>
get('http://www.subscene.com')
<Response [403]

I'm trying to build a web scraper to scrape and download subtitles. But I'm unable to request any subtitle pages as they are returning a response code 403.

Alexandre Neukirchen
  • 2,713
  • 7
  • 26
  • 36
citizenfour
  • 60
  • 1
  • 7

1 Answers1

7

HTTP Status Code 403 Forbidden means:

the server understood the request, but is refusing to fulfill it. Source

The server identified your script as a non-default browser (Chrome, Firefox, etc.) and is refusing to "speak" with it. It's very common to see sites doing this to avoid scrapers, exactly what you're trying to do...

A workaround is to set a user-agent in your headers, like so:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests

url = "http://www.subscene.com"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}

response = requests.get(url, headers=headers)
print(response)  # <Response [200]>

But I advise you to look for a site that provides some sort of API, relying on scraping isn't the best approach.

JChris
  • 1,638
  • 5
  • 19
  • 37