I have been coding a webcrawler in python 3, and everything seems to be working.
So I decided to use urllib to get the source code of the pages I am going to crawl.
But I get a name error that says:
name 'urlib' is not defined
here is my python code:
def get_url(url):
from urllib.request import urlopen
source = urllib.request.urlopen(url)
return source
def getNextTarget(page):
startLink = page.find("<a href=")
if startLink == -1:
return None, 0
startQuote = page.find('"', startLink)
endQuote = page.find('"', startQuote + 1)
url = page[startQuote + 1 : endQuote]
return url, endQuote
def findAllLinks(page):
while True:
url, endpos = getNextTarget(page)
if url:
print(url)
page = page[endpos:]
else:
break
findAllLinks(get_url("https://xkcd.com/"))
Sorry if this question has already been asked.
Thank you for your help in advance.
P.S: the main prblem is with the get_url() function.