I hope you can help me. I am trying to crawl a website with 4500 links in it containing information. So the structure is like this:
Tier 1 (just different categories)
Tier 2 (Containing different Topics)
Tier 3 (Containing Topic Information)
So my script opens each category in a loop - then opens topic by topic and extracts all the information from the Tier 3. But since there are like 4500 Topics, I have problems that I sometimes have a time out error and after this I have to try from beginning(Sometimes after 200 topics, and other time it was after 2200 topics). My question is how can I do it the right way so if it crashes I can proceed with the next topic where it crashed before and not from the beginning. I am new to Ruby and Crawling and would appreciate every single advice.
Thanks!