Web-scraping with recursion - where to put return function

Question

I am trying to scrape a web-site for text. Each page contains a link to a next page, i.e. first page has link "/chapter1/page2.html", which has link "/chapter1/page3.html" Last page has no link. I am trying to write a program that accesses url, prints text of the page, searches through the text and finds url to next page and loops until the very last page which has no url. I try to use if, else and return function, but I do not understand where I need to put it.

def scrapy(url):
    result = requests.get(url, timeout=30.0)
    result.encoding = 'cp1251'
    page = result.text
    link = re.findall(r"\bh(.+?)html", page) # finds link to next page between tags
    print("Continue parsing next page!")
    url = link
    print(url)
    return(url)

url = "http://mywebpage.com/chapter1/page1.html"
result = requests.get(url, timeout=30.0)
result.encoding = 'cp1251'
page = result.text
link = re.findall(r"\bh(.+?)html", page)
if link == -1:
   print("No url!")
else:
   scrapy(url)

Unfortunately it doesn't work; it makes only one loop. Could you kindly tell me what I am doing wrong?

[This answer](https://stackoverflow.com/questions/50118298/recursive-promises-not-returning/50121218#50121218) is in javascript but I think it's probably what you're looking for — Mulan, Jan 11 '19 at 04:23
Thank you, but it doesn't help! I am very inexperienced in programming. — user136555, Jan 11 '19 at 04:59

score 0 · Accepted Answer · answered Jan 11 '19 at 04:59

A couple of things: To be recursive, scrapy needs to call itself. Second, recursive functions need branching logic for a base case and a recursive case. In other words, you need part of your function to look like this (pseudocode):

if allDone
    return
else
    recursiveFunction(argument)

for scrapy, you'd want this branching logic below the line where you find the link (the one where you call re.findall). If you don't find a link, then scrapy is done. If you find a link, then you call scrapy again, passing your newly found link. Your scrapy function will probably need a few more small fixes, but hopefully that will get you unstuck with recursion.

If you want to get really good at thinking in terms of recursion, this book is a good one: https://www.amazon.com/Little-Schemer-Daniel-P-Friedman/dp/0262560992

Thank you very much, Jordan! Your comment gave me an idea - I managed to make it work. — user136555, Jan 12 '19 at 02:37

Web-scraping with recursion - where to put return function

1 Answers1