I'm trying to adapt the code in https://stackoverflow.com/a/46135607/9637147 to scrape all URL links for games on the Cyberix3D website. But it fails to do so when I run my code, giving me a 403 Forbidden error. How do I fix my code?
This is so I can archive all of the games on the Cyberix3D website onto the Wayback Machine (http://web.archive.org/) quicker. I've tried adding the line useragent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) Gecko/20170101 Firefox/67.0".encode("utf-8")
before the first line of the for
loop, then replacing html=urlopen(url)
with html=urlopen(url,useragent)
to allow the code to use that user agent, but even then, I still get a 403 Forbidden error.
from urllib.request import urlopen
from bs4 import BeautifulSoup
file="Cyberix3D games.csv"
f=open(file,"w")
Headers="Link\n"
f.write(Headers)
for page in range(1,410):
url="http://www.gamemaker3d.com/games#page={}&orderBy=Recent".format(page)
html=urlopen(url)
soup=BeautifulSoup(html,"html.parser")
Title=soup.find_all("a",{"href":"views-field-nothing"})
for i in Title:
try:
link=i.find("a",{"href":"/player?pid="}).get_text()
print(link)
f.write("{}".format(link))
except:AttributeError
f.close()
I expect the aforementioned links to be printed in the Python 3.7.4 Shell and also be added to a CSV file called Cyberix3D games.csv, but I get urllib.error.HTTPError: HTTP Error 403: Forbidden
, following a bunch of File "C:\Users\Niall Ward\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line x, in y
s, in the Python 3.7.4 Shell, as well as an empty CSV file called Cyberix3D games.csv, instead.