How do I use For Loop to get multiple links from an html?

Question

This is what I have at the moment:

import bs4
import requests

def getXkcdComic(comicUrl):
    for i in range(0,20):
        res = requests.get(comicUrl + str(1882 - i))
        res.raise_for_status()

        soup = bs4.BeautifulSoup(res.text, 'html.parser')
        img = soup.select_one("div#comic > img")
        return str(img['src'])


link = getXkcdComic('https://xkcd.com/')

print(link)

I parses the html, gets one link, the first one, and since I know the url finishes at 1882 and the next I want is 1881, I wrote this for-loop to get the rest. It only prints one result as if there was not loop written. Strangely, if I reduce the indentation for the return function it returns a different url.

I didn't quite get how For-loops works yet. Also, this is my first post ever here so forgive my english and ignorance.

score 3 · Accepted Answer · answered Aug 29 '17 at 17:35

The first time you hit a return statement, the function is going to return, regardless of whether you're in a loop. So your for() loop is going to get to the end of the first iteration, see the return, and that's it. The other 19 iterations won't run.

The reason you get a different URL if you dedent the return is that your for() loop can now run to completion. But since you didn't save any of your previous iterations, it will return only the last one.

What it looks like you might want is to build a list of results, and return that.

def getXkcdComic(comicUrl):
    images = []                           # Create an empty list for results
    for i in range(0,20):
        res = requests.get(comicUrl + str(1882 - i))
        res.raise_for_status()
        soup = bs4.BeautifulSoup(res.text, 'html.parser')
        img = soup.select_one("div#comic > img")
        images.append(str(img['src']))    # Save the result by adding it to the list
    return images                         # Return the list

Just remember then that link in your outer scope will actually be a list of links, and handle it accordingly.

Perfect. I was really thinking I should put them in a list but I couldn't recall how (this is my second week on python). Thanks a lot! I used `pprint.pprint (link[0:20])` and it worked. — Gabriel Almeida, Aug 29 '17 at 17:44

Moses Koledoye · Answer 2 · 2017-08-29T17:38:07.020

0

Your function returns control to the caller once it encounters the return statement, here in the first iteration of the for.

You can yield instead of return in your function to produce image links successively from the function and keep the for loop running:

import bs4
import requests

def getXkcdComic(comicUrl):
    for i in range(0,20):
        ...
        yield img['src']  # <- here

# make a list of links yielded by function
links = list(getXkcdComic('https://xkcd.com/'))

References:

edited Aug 29 '17 at 17:38

answered Aug 29 '17 at 17:30

Moses Koledoye

77,341
8
133
139

This works perfectly too! So cool to see so many ways to get the same result. Thanks a lot! – Gabriel Almeida Aug 29 '17 at 18:19

KyleV · Answer 3 · 2017-09-12T16:56:58.997

0

When you call 'return' during the first loop the entire 'getXkcdComic' function exits and returns.

Something like this may work and print like the original code intended:

import bs4
import requests

def getXkcdComic(comicUrl, number):
    res = requests.get(comicUrl + str(number))
    res.raise_for_status()

    soup = bs4.BeautifulSoup(res.text, 'html.parser')
    return str(soup.select_one("div#comic > img")['src'])

link = 'https://xkcd.com/'
for i in range(20):
    print(getXkcdComic(link, 1882-i))

edited Sep 12 '17 at 16:56

answered Aug 29 '17 at 17:35

KyleV

43
7

This didn't seem to work. But I don't know why. Thanks anyway =) – Gabriel Almeida Aug 29 '17 at 17:48
I've edited to fix the missing parentheses and the link value. – KyleV Sep 12 '17 at 16:57

score 0 · Answer 4 · answered Aug 29 '17 at 17:36

How do you expect to get multiple outputs (url here) with a single method call? The for loop helps you iterate over a range multiple times and get multiple results, but its of no use until you have a single call. You can do one of the following:

Instead of writing a loop inside the method, call the method in a loop. That way your output will be printed for each call.
Write the entire thing in the method so that you have multiple print statements.

Do the following:

def getXkcdComic(comicUrl):
    for i in range(0,20):
        res = requests.get(comicUrl + str(1882 - i))
        res.raise_for_status()
        soup = bs4.BeautifulSoup(res.text, 'html.parser')
        img = soup.select_one("div#comic > img")
        print str(img['src'])
getXkcdComic('https://xkcd.com/')

This is what I was trying and it works perfectly. Thanks a lot! — Gabriel Almeida, Aug 29 '17 at 17:55

score 0 · Answer 5 · answered Aug 29 '17 at 17:40

It happened because you make return in the loop. Try it:

def getXkcdComic(comicUrl):
    res = list()
    for i in range(0,20):
        res = requests.get(comicUrl + str(1882 - i))
        res.raise_for_status()

        soup = bs4.BeautifulSoup(res.text, 'html.parser')
        img = soup.select_one("div#comic > img")
        res.append(str(img['src']))
    return res

And you can change this:

for i in range(0,20):
            res = requests.get(comicUrl + str(1882 - i))

on this:

for i in range(1862, 1883, 1):
            res = requests.get(comicUrl + str(i))

score 0 · Answer 6 · answered Aug 29 '17 at 17:42

0

The other answers are good and general, but for this specific case there's an even better way. xkcd provides a JSON API, so you can use a list comprehension:

def getXkcdComic(comicUrl):
    return [requests.get(comicUrl + str(1882 - i) + '/info.0.json').json()['img']
            for i in range(0,20)]

This is also faster and more friendly to the xkcd servers.

answered Aug 29 '17 at 17:42

Alex Hall

34,833
5
57
89

I was indeed wondering about their servers. It takes a few seconds to run the code. I know absolutely nothing about JSON API, but it seems interesting! Thanks for the answer. Edit: How do I print the result in this case? – Gabriel Almeida Aug 29 '17 at 18:22

How do I use For Loop to get multiple links from an html?

6 Answers6