Still figuring out this web scraping thing. Coming across an error when trying to scrape an HTTPS site. Something to do with SSL certificates and the site side rejecting my connection? This is my code:
from bs4 import BeautifulSoup
import requests
import csv
with open('UrlsList.csv', newline='') as f_urls, open('Output.csv', 'w', newline='') as f_output:
csv_urls = csv.reader(f_urls)
csv_output = csv.writer(f_output)
for line in csv_urls:
page = requests.get(line[0], verify='.\Cert.cer').text
soup = BeautifulSoup(page, 'html.parser')
results = soup.findAll('td', {'class' :' alpha'})
for r in range(len(results)):
csv_output.writerow([results[r].text])
...Which gives me a big screen of issues with the following error at the bottom:
raise exception_type(errors)
OpenSSL.SSL.Error: []
I have tried just putting the verify=False as well, and that gives me the following error:
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
I've tried to research the answer on my own, but I can't seem to make sense of any solution so far. I've recently just updated my PyOpenSSL to version 18 as well. Just seems the site I'm trying to scrape doesn't accept my connection, but the URL is real and I can view the site no problem from Chrome?
Thanks a lot!