Im trying to automatically crawl a given site by following all internal links, to do this ive been playing with pythons mechanize
library, although this doesnt allow me to work with javascript and ajax content.
How does Google Bot and other major search engine spiders / bots do this, is there another tool out that can complement mechanize
in this scenario ?
Im aware i could reverse engineer the javascript to work out what its doing and them mimic that, but i want to automate the crawl, so it wouldn't be practical if i first had to comb through each sites javascript.