I have this method that when supplied with a list of links will get the child links and so on and so forth:
def crawlSite(self, linksList):
finalList = []
for link in list(linksList):
if link not in finalList:
print link
finalList.append(link)
childLinks = self.getAllUniqueLinks(link)
length = len(childLinks)
print 'Total links for this page: ' + str(length)
self.crawlSite(childLinks)
return finalList
It eventually will repeat itself with the same set of links and I can't seem to figure it out. When I move the self.crawlSite(childLinks)
inside of the if statement. I get the first item in the list repeated over and over.
Background on the self.getAllUniqueLinks(link)
method get a list of links from a given page. It filters to all click-able links within a given domain. Basically what I am trying to do is get all click-able links from a website. If this isn't the desired approach. Could you recommend a better method that can do the exact same thing. Please also consider that I am fairly new to python and might not understand more complex approaches. So please explain your thought processes. If you don't mind :)