I am trying to crawl links from a website and store in a text file.There are about 1000 links that I need to crawl but it gives error on about 24 links. I am very new to web crawling and would appreciate some help.
If I remove the try:except statement, I get an error.
'ascii' codec can't encode character u'\u201c' in position 34: ordinal not in range
I have tried all other similar questions and this is not a duplicate question.
fobj.write(link.text + "\n")
The line above gives the error.
url = "https://tools.wmflabs.org/enwp10/cgi-bin/list2.fcgi?run=yes&projecta=Economics&namespace=&pagename=&quality=&importance=&score=&limit=1000&offset=1&sorta=Importance&sortb=Quality"
r = requests.get(url)
soup = BeautifulSoup(r.content, "lxml")
links = soup.findAll('a', href=True)
b = "https://en.wikipedia.org/"
fobj = open("file.txt", 'a')
fobj2 = open("links.txt", 'a')
for link in links:
try:
a = link["href"].encode('utf8')
if b in a:
fobj.write(link.text + "\n")
fobj2.write(a + "\n")
except:
print("error")
fobj.close()
fobj2.close()