0

Still figuring out this web scraping thing. Coming across an error when trying to scrape an HTTPS site. Something to do with SSL certificates and the site side rejecting my connection? This is my code:

from bs4 import BeautifulSoup
import requests
import csv

with open('UrlsList.csv', newline='') as f_urls, open('Output.csv', 'w', newline='') as f_output:
    csv_urls = csv.reader(f_urls)
    csv_output = csv.writer(f_output)


    for line in csv_urls:
        page = requests.get(line[0], verify='.\Cert.cer').text
        soup = BeautifulSoup(page, 'html.parser')
        results = soup.findAll('td', {'class' :' alpha'})
        for r in range(len(results)):
            csv_output.writerow([results[r].text])

...Which gives me a big screen of issues with the following error at the bottom:

raise exception_type(errors)
OpenSSL.SSL.Error: []

I have tried just putting the verify=False as well, and that gives me the following error:

raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

I've tried to research the answer on my own, but I can't seem to make sense of any solution so far. I've recently just updated my PyOpenSSL to version 18 as well. Just seems the site I'm trying to scrape doesn't accept my connection, but the URL is real and I can view the site no problem from Chrome?

Thanks a lot!

wildcat89
  • 1,159
  • 16
  • 47
  • Try this solution: https://stackoverflow.com/questions/15445981/how-do-i-disable-the-security-certificate-check-in-python-requests. Basically set the `verify` parameter to `False`. – Andrej Kesely Aug 11 '18 at 06:40
  • Are you on Mac? – jlaur Aug 11 '18 at 11:39
  • If so it's a well known Mac issue. Remove the verify-argument in requests and do a pip install certifi. You can read about this Mac-issue here: http://www.cdotson.com/2017/01/sslerror-with-python-3-6-x-on-macos-sierra/ – jlaur Aug 11 '18 at 11:47
  • Thanks @AndrejKesely but like I said above, I've tried setting verify=False and I just get another error message? – wildcat89 Aug 11 '18 at 18:03
  • @jlaur No, I'm on windows 10 – wildcat89 Aug 11 '18 at 18:04
  • What happens if you remove the verify-part? It shouldn't be there in the first place... – jlaur Aug 11 '18 at 18:35
  • And what url causes this? If you print line[0]... – jlaur Aug 11 '18 at 18:37
  • @jlaur I get the same error message as having the verify=False in there. The URL I'm trying to extract from is: https://www.zacks.com/stocks/industry-rank/aerospace-defense-equipment-3/ ...but this is just one of a list in the URLs CSV file. (there should be an H T T P S : // W W W . there) – wildcat89 Aug 11 '18 at 19:14
  • Is it missing https:// - else that's your bug. – jlaur Aug 11 '18 at 19:17
  • no, the https:// is there. Stackoverflow just deletes it and automatically makes it a link for easier viewing. – wildcat89 Aug 11 '18 at 19:17
  • What happens if you take that url out of the csv and run the program? What happens if you create a new script that just tries to request that one hardcoded url and print the html? – jlaur Aug 11 '18 at 20:47
  • Could you paste in the entire stacktrace given when hitting that url? – jlaur Aug 11 '18 at 20:53
  • 1
    Just tried your url and this solution worked for me: https://stackoverflow.com/questions/43165341/python3-requests-connectionerror-connection-aborted-oserror104-econnr#43167631 – Paula Thomas Aug 12 '18 at 09:29
  • @PaulaThomas Worked! Thank you so much!! – wildcat89 Aug 14 '18 at 01:33

0 Answers0