1

I am trying to scrape https://www.vitals.com/locations/primary-care-doctors/ny. I have been able to scrape other sites by editing my headers, but I keep getting a 403 error with this one.

from bs4 import BeautifulSoup
import requests

with requests.Session() as se:
    se.headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
        "Accept-Language": "en-US,en;q=0.9",
    }


test_sites = [
 'http://fashiontoast.com/',
 'https://www.vitals.com/locations/primary-care-doctors/ny',
 'http://www.seaofshoes.com/',
 ]

for site in test_sites:
    print(site)
    #get page soure
    response = se.get(site)
    print(response)
    #print(response.text)
Dharman
  • 30,962
  • 25
  • 85
  • 135
Jdef
  • 11
  • 1

1 Answers1

1

Try adding the code to the with statement as follows

from bs4 import BeautifulSoup
import requests

with requests.Session() as se:
    se.headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
        "Accept-Language": "en-US,en;q=0.9",
    }

    test_sites = [
     'http://fashiontoast.com/',
     'https://www.vitals.com/locations/primary-care-doctors/ny',
     'http://www.seaofshoes.com/',
     ]

    for site in test_sites:
        print(site)
        #get page soure
        response = se.get(site)
        print(response)
        #print(response.text)
Marsilinou Zaky
  • 1,038
  • 7
  • 17
  • I still receive the same responses. – Jdef Nov 22 '19 at 23:58
  • I'm not too sure why you're getting this issue still, this question was asked multiple times and adding user agent tend to solve it https://stackoverflow.com/questions/38489386/python-requests-403-forbidden https://stackoverflow.com/questions/41946166/requests-get-returns-403-while-the-same-url-works-in-browser https://stackoverflow.com/questions/45086383/python-requests-403-forbidden-despite-setting-user-agent-headers https://stackoverflow.com/questions/41361444/python-requests-error-403 https://stackoverflow.com/questions/45930720/python-requests-html-403-response – Marsilinou Zaky Nov 23 '19 at 01:00
  • I followed the advice in some of the links and tried adding additional headers such as referer but still receive these error messages in response: title>Access denied | www.vitals.com used Cloudflare to restrict access What happened?

    This website is using a security service to protect itself from online attacks.

    – Jdef Nov 23 '19 at 13:23