using urllib in python 3

Question

I have been coding a webcrawler in python 3, and everything seems to be working.
So I decided to use urllib to get the source code of the pages I am going to crawl.
But I get a name error that says:

    name 'urlib' is not defined

here is my python code:

def get_url(url):
    from urllib.request import urlopen
    source = urllib.request.urlopen(url)
    return source

def getNextTarget(page):
    startLink = page.find("<a href=")
    if startLink == -1:
        return None, 0
    startQuote = page.find('"', startLink)
    endQuote = page.find('"', startQuote + 1)
    url = page[startQuote + 1 : endQuote]
    return url, endQuote

def findAllLinks(page):
while True:
    url, endpos = getNextTarget(page)
    if url:
        print(url)
        page = page[endpos:]
    else:
        break

findAllLinks(get_url("https://xkcd.com/"))

Sorry if this question has already been asked.
Thank you for your help in advance.
P.S: the main prblem is with the get_url() function.

`urllib` has two `l`s. But since you used a `from ... import`, you just need to do `source = urlopen(url)`. — Aran-Fey, Aug 03 '18 at 08:11
same error when I fix the single l. and changing the sorce to just 'urlopen(url)' gives an error saying: AttributeError: 'HTTPResponse' object has no attribute 'find' — James Deal, Aug 03 '18 at 08:15

score 0 · Accepted Answer · answered Aug 03 '18 at 08:29

0

Your get_url function returns a connection object and not a string. So you cannot do a page.find() on it in getNextTarget. You should do a .read() on your connection object to get a string.

Refer:

AttributeError: 'HTTPResponse' object has no attribute 'split' https://docs.python.org/3/library/urllib.request.html

answered Aug 03 '18 at 08:29

Viral Modi

1,957
1
9
18

sorry, bit of a noob. can you please clarify. – James Deal Aug 03 '18 at 09:02
Oh, I got it. Thank you so much for your help! :) – James Deal Aug 03 '18 at 09:21

using urllib in python 3

1 Answers1