503 Error When Trying To Crawl One Single Website Page | Python | Requests

Question

Goal: I am trying to scrape the HTML from this page: https://www.doherty.jobs/jobs/search?q=&l=&lat=&long=&d=.

(note - I will eventually want to paginate and scrape all job listings from this page)

My issue: I get a 503 error when I try to scrape the page using Python and Requests. I am working out of Google Colab.

Initial Code:

import requests

url = 'https://www.doherty.jobs/jobs/search?q=&l=&lat=&long=&d='

response = requests.get(url)

print(response)

Attempted solutions:

Using 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
Implementing this code I found in another thread:

import requests

def getUrl(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36',
    }
    res = requests.get(url, headers=headers)
    res.raise_for_status()

getUrl('https://www.doherty.jobs/jobs/search?q=&l=&lat=&long=&d=')

I am able to access the website via my browser.

Is there anything else I can try?

Thank you

score 7 · Accepted Answer · answered Aug 19 '21 at 17:50

7

That page is protected by cloudflare, there's some options to try to bypass it, seems that using cloudscraper works:

import cloudscraper

scraper = cloudscraper.create_scraper()
url = 'https://www.doherty.jobs/jobs/search?q=&l=&lat=&long=&d='

response = scraper.get(url).text

print(response)

In order to use it, you'll need to install it:

pip install cloudscraper

answered Aug 19 '21 at 17:50

Joaquin

2,013
3
14
26

1

just a note that this doesn't always work: `cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.` – evandrix Sep 17 '22 at 07:45

503 Error When Trying To Crawl One Single Website Page | Python | Requests

1 Answers1

Linked