MSDN is a huge hierarchical doc site.
To be more precise, the content is organized in a hierarchical manner, but the URLs are not. The URL space is flat, making it look like everything is in the same directory. (In reality, there probably isn't a directory; I guess things are coming out of some other database; but that's not relevant here.)
So if you want to download part of MSDN, say, the NMake manual, you can't just recursively download everything below a given directory. Because that will be all of MSDN. Too much for your hard drive and bandwith.
But you could write a script that looks at the DOM (HTML) to then follow and download only those links contained in certain navigational sections of the document, like those of CSS class
attribute toc_children
and toc_siblings
, but not toc_parent
.
What you'd need would be some downloader that allows you to say:
$webclient->add_links( $xpath_expression ); # or
$webclient->add_links( $css_selector );
It shouldn't be too difficult to cobble something together using Perl, LWP and XML::LibXML (HTML parser), but maybe you know of a tool that allows you to do just that so I don't need to reinvent it.
It doesn't have to be Perl, any other language is fine, too, and so is a ready-made program that has the flexibility required for this job.