Web crawler error

Question

I'm completely new to Python. I'm starting an internship in January and they want me getting up to speed in Python as much as possible prior to starting. So I made this web crawler just for practice, and I'm pretty sure my code is OK. I have the code below, I could post the errors it incurs, but it's literally pages worth of errors all from the requests package. Can I fix this? Is it my code or is there something deeper going on?

(I'm having trouble formatting the code in this window, but it is all formatted correctly in my actual editor, there are no syntax errors)

Code

import requests
from bs4 import BeautifulSoup


def trade_spider(max_pages):
    page = 1
    while page <= max_pages:
        url = 'https://www.thenewboston.com/forum/recent_activity.php?page=' + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        for link in soup.findAll('a', {'class': 'title'}):
            href = link.get('href')
            print(href)
        page += 1

trade_spider(3)

Errors

All of the errors are from this file: C:\Python34\lib\site-packages\requests\packages\urllib3\connectionpool.py

A bunch of small errors within that file

Also this error

requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:600)

Possible Conclusion

Since none of the errors are within my code and they're all in the requests package, I'm guessing that package is broken or something, or there is something outdated about my software or the requests package?

Any help is appreciated. I'm just trying to learn some Python and I would be so happy if I could create a functioning web crawler.

Let me get this straight. You're completely new to Python, this is your first ever program, and you assume that it's *requests*, one of the most popular of all Python libraries, that is broken? — Daniel Roseman, Dec 24 '15 at 22:47
You might be looking for the `title text-semibold` class attribute instead of `title`. Each video has the former on its link. — cwahls, Dec 24 '15 at 22:48
@DanielRoseman He means that his installation might have had errors or something. — cwahls, Dec 24 '15 at 22:49
Duplicate of https://stackoverflow.com/questions/15445981/how-do-i-disable-the-security-certificate-check-in-python-requests. Accepted answer has the solution (use verify=False with requests.get(...)) — Anand Bhat, Dec 25 '15 at 03:04

score 0 · Answer 1 · answered Apr 11 '18 at 08:58

0

Use source_code = requests.get(url, verify=False) to disable SSL checking.

answered Apr 11 '18 at 08:58

juwi

98
7

Web crawler error

1 Answers1