so I have some software who uses webscraping, but for some reason it doesn't seem to work. It's bizarre because when I run it in Google Colab, the code works fine and the url's can open and be scraped, but when I run it in my web application (and run it on my console using python3 run.py) it doesn't work.
Here is the code that is returning errors :
b = searchgoogle(query, num)
c = []
print(b)
for i in b:
extractor = extractors.ArticleExtractor()
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/50.0.2661.102 Safari/537.36'
}
extractor = extractors.ArticleExtractor()
req = Request(url=i, headers=headers)
d = urlopen(req)
try:
if d.info()['content-type'].startswith('text/html'):
print ('its html')
resp = requests.get(i, headers=headers)
if resp.ok:
doc = extractor.get_content(resp.text)
c.append(comparetexts(text,doc,i))
else:
print(f'Failed to get URL: {resp.status_code}')
else:
print ('its not html')
except KeyError:
print( 'its not html')
print(i)
return c
The code returning errors is the "d = urlopen(req)"
There is code above the section I just put here but it has nothing to do with the errors. Anyways, thanks for your time!
(By the way, I checked my OPEN SSL version on python3 and it says : 'OpenSSL 1.1.1m 14 Dec 2021' so I think it's up to date)