Using "return" function instead of "print" in a scraper

Question

In my script below if I take out "return" statement and place there "print" then I get all the results. However, If i run it as it is, i get only the first item. My question is how I can get all the results using "return" in this case, I meant, what should be the process?

Here is the script:

import requests
from lxml import html

main_link = "http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1467-6281/issues"

def abacus_scraper(main_link):
    tree = html.fromstring(requests.get(main_link).text)
    for titles in tree.cssselect("a.issuesInYear"):
        title = titles.cssselect("span")[0].text
        title_link = titles.attrib['href']
        return title, title_link

print(abacus_scraper(main_link))

Result:

('2017 - Volume 53 Abacus', '/journal/10.1111/(ISSN)1467-6281/issues?activeYear=2017')

You could create a list of tuples within the method and then return the list, or yield the tuples and use an iterator to get the values from the method by iterating over the results — Alan Kavanagh, Sep 06 '17 at 21:10

score 4 · Accepted Answer · answered Sep 06 '17 at 21:09

4

As soon as you return from a function, you exit the for loop.

You should keep a list inside abacus, and append to the list on each iteration. After the loop is finished, then return the list.

For example:

import requests
from lxml import html

main_link = "http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1467-6281/issues"

def abacus_scraper(main_link):
    results = []
    tree = html.fromstring(requests.get(main_link).text)
    for titles in tree.cssselect("a.issuesInYear"):
        title = titles.cssselect("span")[0].text
        title_link = titles.attrib['href']
        results.append([title, title_link])
    return results

print(abacus_scraper(main_link))

answered Sep 06 '17 at 21:09

Solaxun

2,732
1
22
41

sure - just providing an example of any arbitrary collection. – Solaxun Sep 06 '17 at 21:12
I don't know why it takes 10 minutes more to accept your answer. However very much thankful to you. I thought in the first place that this post would be saturated with unlimited downvotes because of my lack of knowledge but in reality you people are so helpful. – SIM Sep 06 '17 at 21:15
wouldn't a list of tuples be more efficient than a list of lists in this scenario? – Bart Van Loon Sep 06 '17 at 21:16
@BartVanLoon marginally. It's the sort of thing I wouldn't worry either way. But maybe I would just use a `tuple`, or even better, a `namedtuple`. – juanpa.arrivillaga Sep 06 '17 at 21:21
@Solaxun, I suppose you should care for providing with the same script using tuple as well. Thanks in advance. – SIM Sep 06 '17 at 21:25
@Topto We all start somewhere - glad to help. You typically only get downvoted into oblivion if you ask something that has been answered already, or you don't provide any concrete examples demonstrating that you have at least attempted your problem. – Solaxun Sep 06 '17 at 21:26
if you want a list of tuples, you can just change [title, link] to remove the brackets: title,link. – Solaxun Sep 06 '17 at 21:27
1

no, since `append` only takes one argument and two would be given. you need to replace the []'s with ()'s – Bart Van Loon Sep 06 '17 at 21:32
1

Oops - yeah missed the parenthesis. Good catch. For appending, you would need to do (title,link) as @BartVanLoon mentioned. However creating a tuple is signified by comma, not parenthesis (common point of confusion). So, for example, if you created an intermediate variable `title_and_link = title,title_link` then you would only need to append `title_and_link` , parens not necessary. – Solaxun Sep 06 '17 at 21:36
Thanks a trillion to everyone for making such invaluable comments. – SIM Sep 06 '17 at 21:45

Using "return" function instead of "print" in a scraper

1 Answers1