2

Goal: I am trying to scrape the HTML from this page: https://www.doherty.jobs/jobs/search?q=&l=&lat=&long=&d=.

(note - I will eventually want to paginate and scrape all job listings from this page)

My issue: I get a 503 error when I try to scrape the page using Python and Requests. I am working out of Google Colab.

Initial Code:

import requests

url = 'https://www.doherty.jobs/jobs/search?q=&l=&lat=&long=&d='

response = requests.get(url)

print(response)

Attempted solutions:

  1. Using 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
  2. Implementing this code I found in another thread:
import requests

def getUrl(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36',
    }
    res = requests.get(url, headers=headers)
    res.raise_for_status()

getUrl('https://www.doherty.jobs/jobs/search?q=&l=&lat=&long=&d=')

I am able to access the website via my browser.

Is there anything else I can try?

Thank you

ruscias
  • 35
  • 5

1 Answers1

7

That page is protected by cloudflare, there's some options to try to bypass it, seems that using cloudscraper works:

import cloudscraper

scraper = cloudscraper.create_scraper()
url = 'https://www.doherty.jobs/jobs/search?q=&l=&lat=&long=&d='

response = scraper.get(url).text

print(response)

In order to use it, you'll need to install it:

pip install cloudscraper
Joaquin
  • 2,013
  • 3
  • 14
  • 26
  • 1
    just a note that this doesn't always work: `cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.` – evandrix Sep 17 '22 at 07:45