Crawling, can't connect to the site

Question

I have a little issue, because when I want to crawle a site i get error like: "HTTP Error 404 not found" I tried some ways to fix it, but it didn't work. I can't connect with the site to get the data.

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import urllib.request

my_url="https://tabletennis.setkacup.com/en/schedule?date=2021-08-29&hall=4&period=1"
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0'
headers={'User-Agent':user_agent,}

request = urllib.request.Request(my_url,None,headers)
uClient = uReq(request)

Interestingly, omitting headers results in a 403 Forbidden error. — Marcello Romani, Aug 29 '21 at 12:34

score 1 · Answer 1 · answered Aug 29 '21 at 12:34

1

It seems like SSL Error if so look at here.

Or you can try requests library.

pip install requests

import requests

my_url="https://tabletennis.setkacup.com/en/schedule?date=2021-08-29&hall=4&period=1"
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0'
headers={'User-Agent':user_agent,}

response = requests.request(method="GET", url=my_url, headers=headers)
print(response.content)

answered Aug 29 '21 at 12:34

MertG

753
1
6
22

That definitely works. But the body is just a reference to a javascript file ` ` This SO question seems relevant https://stackoverflow.com/questions/16157719/how-to-follow-a-redirect-with-urllib – Marcello Romani Aug 29 '21 at 12:43

Crawling, can't connect to the site

1 Answers1