I'm tasked with a web scraping project. We are pulling a bunch of our static content into a CMS.
HtmlAgilityPack lets me grab dependent resources by looking for anything with a src or http=, but what about css files and their background images? Is there a good utility for parsing css files to get this?
My current solution is a bit of the cthulu way of doing this:
Regex r = new Regex(@"url\(.*\)");
foreach (var item in r.Matches(cssText))
{
///scrub url and
///mark img for download
}