I need to get all URLs (url()
expressions) from CSS files. For example:
b { background: url(img0) }
b { background: url("img1") }
b { background: url('img2') }
b { background: url( img3 ) }
b { background: url( "img4" ) }
b { background: url( 'img5' ) }
b { background: url (img6) }
b { background: url ("img7") }
b { background: url ('img8') }
{ background: url('noimg0) }
{ background: url(noimg1') }
/*b { background: url(noimg2) }*/
b { color: url(noimg3) }
b { content: 'url(noimg4)' }
@media screen and (max-width: 1280px) { b { background: url(img9) } }
b { background: url(img10) }
I need to get all img*
URLs, but not noimg*
URLs (invalid syntax or invalid property or inside comments).
I've tried using good old regular expressions. After some trial and error I got this:
private static IEnumerable<string> ParseUrlsRegex (string source)
{
var reUrls = new Regex(@"(?nx)
url \s* \( \s*
(
(?! ['""] )
(?<Url> [^\)]+ )
(?<! ['""] )
|
(?<Quote> ['""] )
(?<Url> .+? )
\k<Quote>
)
\s* \)");
return reUrls.Matches(source)
.Cast<Match>()
.Select(match => match.Groups["Url"].Value);
}
That's one crazy regex, but it still doesn't work -- it matches 3 invalid URLs (namely, 2, 3 and 4). Furthermore, everyone will say that using regex for parsing complex grammar is wrong.
Let's try another approach. According to this question, the only viable option is ExCSS (others are either too simple or outdated). With ExCSS I got this:
private static IEnumerable<string> ParseUrlsExCss (string source)
{
var parser = new StylesheetParser();
parser.Parse(source);
return parser.Stylesheet.RuleSets
.SelectMany(i => i.Declarations)
.SelectMany(i => i.Expression.Terms)
.Where(i => i.Type == TermType.Url)
.Select(i => i.Value);
}
Unlike regex solution, this one doesn't list invalid URLs. But it doesn't list some valid ones! Namely, 9 and 10. Looks like this is known issue with some CSS syntax, and it can't be fixed without rewriting the whole library from scratch. ANTLR rewrite seems to be abandoned.
Question: How to extract all URLs from CSS files? (I need to parse any CSS files, not only the one provided as an example above. Please don't heck for "noimg" or assume one-line declarations.)
N.B. This is not a "tool recommendation" question, as any solution will be fine, be it a piece of code, a fix to one of the above solutions, a library or anything else; and I've clearly defined the function I need.