What is a responsible / ethical time delay to put in a web crawler that only crawls one root page?
I'm using time.sleep(#) between the following calls
requests.get(url)
I'm looking for a rough idea on what timescales are: 1. Way too conservative 2. Standard 3. Going to cause problems / get you noticed
I want to touch every page (at least 20,000, probably a lot more) meeting certain criteria. Is this feasible within a reasonable timeframe?
EDIT
This question is less about avoiding being blocked (though any relevant info. would be appreciated) and rather what time delays do not cause issues to the host website / servers.
I've tested with 10 second time delays and around 50 pages. I just don't have a clue if I'm being over cautious.