403 Forbidden on site with urllib3

Question

So I am working on a project crawling different sites. All sites work except for caesarscasino.com. No matter what I try I get a 403 Forbidden Error. I have searched on here and others to no avail.

Here is my code:

import urllib3
import urllib.request, urllib.error
from urllib.request import Request
import ssl

try:
    from urllib2 import urlopen
except ImportError:
    from urllib.request import urlopen

ssl._create_default_https_context = ssl._create_unverified_context #  overrides the default function for context creation with the function to create an unverified context.
urllib3.disable_warnings()

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'en-US,en;q=0.8',
       'Connection': 'keep-alive'}
url = 'https://www.caesarscasino.com/'
req = Request(url, headers=headers) #opens the URL 
result = urllib.request.urlopen(req).read()

print(result)

With this error code:

Traceback (most recent call last):

  File "C:\Users\sp\Desktop\untitled0.py", line 30, in <module>
    result = urllib.request.urlopen(req).read()

  File "C:\Users\sp\anaconda3\envs\spyder\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)

  File "C:\Users\sp\anaconda3\envs\spyder\lib\urllib\request.py", line 531, in open
    response = meth(req, response)

  File "C:\Users\sp\anaconda3\envs\spyder\lib\urllib\request.py", line 640, in http_response
    response = self.parent.error(

  File "C:\Users\sp\anaconda3\envs\spyder\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)

  File "C:\Users\sp\anaconda3\envs\spyder\lib\urllib\request.py", line 502, in _call_chain
    result = func(*args)

  File "C:\Users\sp\anaconda3\envs\spyder\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: Forbidden

In a browser I find that I can't reach it with `https` but only `http` — Andrew Allaire, Jun 19 '20 at 17:08

score 0 · Accepted Answer · answered Jun 19 '20 at 17:14

0

The thing with scraping the web is, that not a lot of people like being scraped. Thus they do not allow a machine (which you scraper is) to access that page. This is the error you are getting. It basically means, do not access that site, when you are a programm. However, there are ways around that. Like spoofing the IP address and rotating headers, while your programm checks out this site. I already answered that question on how to do so here. Check it out and let me know in the comments whether or not that works for you.

answered Jun 19 '20 at 17:14

Yannik Suhre

724
5
21

I'm trying to understand your code in that solution. Why do I need this loop? ``` for n in range(1, 20): req = Request('http://icanhazip.com') ``` Is that request where I'd do mine? – Pittsie Jun 22 '20 at 18:45
Also my main goal is to just get the sites status code – Pittsie Jun 22 '20 at 19:08

score 0 · Answer 2 · answered Jun 19 '20 at 17:20

0

I believe your issues are related to the fact that it's https. See here for info on how to fix that.

answered Jun 19 '20 at 17:20

Cz_

371
2
8

I appreciate the help but see above comments that was not the issue – Pittsie Jun 22 '20 at 19:03

403 Forbidden on site with urllib3

2 Answers2