In create_jobs
you are calling crawl
, that would be okay if it was just that. But since you are also calling create_jobs
from crawl
, you can possibly enter an infinite loop. If you had not the condition len(queued_links) > 0
, it would be an infinite loop. To prevent such problems (avoiding stack overflow), python has a recursion limit (see : What is the maximum recursion depth in Python, and how to increase it?).
The thing here is that a webpage is quite likely to contain links to other webpages, so your condition to stop the loop will not occur too often. That's why you are hitting the recursion limit. You can increase this limit by doing the following (snippet of code taken here : Python: Maximum recursion depth exceeded), but I would not advise you to do that:
import sys
sys.setrecursionlimit(10000) # 10000 is an example, try with different values
The good approach to solve this problem would be to change the design of your algorithm to something like this (basically you iterate over an array that you populate while crawling, instead of making recursive calls):
def crawl(url):
return [url+'a', url+'b']
links = ['foo', 'bar']
for link in links:
links.extend(crawl(link))
Regarding the fact that sometimes your algorithm works and sometimes not, it is quite likely that the pages change over time, if you are really close to the recursion limit it might me that depending on which pages were generated you hit that limit or not.
Finally, it's not because you have only 100 links that you can't hit a recursion limit of 1000 for instance. Indeed, for instance your crawl function will call other functions, etc. ... some recursions are hidden.