Parse CSS classes however ignore @media queries

Question

I have a CSS parser utility written in C#. I am able to parse and extract all CSS classes using following regex. This is working as intended.

[C#]

const string expression = "(.*?)\\{(.*?)\\}";
var regEx = new Regex(expression, RegexOptions.Singleline | RegexOptions.IgnoreCase);
var matches = regEx.Matches(styleSheet);

[CSS]

body 
{
    font-family: Helvetica Neue,Helvetica,Arial,sans-serif;
    font-size: 13px;
    color: #666666;
}

img 
{
    border: 0;
    display: block;
}

@media only screen and (max-width: 600px)
{
    table[class=bodyTable] 
    {
        width: 100% !important;
    }

    table[class=headerlinks]
    {
        display:none !important;
    }
}

a 
{
    text-decoration: none;
}

However now our software have started supporting media queries and for some reason we want to ignore whole media queries during CSS parsing. So it should only match body, img and a.

Appreciate if someone can help me with writting new regex :)

[Workaround] Once I get all matches, in my code I have to do some processing by using foreach -

foreach(Match match in matches)
{
    var selectorString = match.Groups[1].ToString();

    if (selectorString.IndexOf("@media", StringComparison.InvariantCulture) > -1)
        continue;

    // processing...
}

I will defer this to the regex experts, but I've been playing around with solving this problem. My first guess would be to use a lookaround to determine whether the matched set begins with `@media`. When dealing with the arbitrarily nested `@media` though, ever subgroup was matched. An explanation might be here: http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns I could be wrong but I'm inclined to be inefficient and use `substring` to remove `@media` parts before feeding your regex matcher — Daniel Kotin, Apr 04 '14 at 12:22
So far I have used a workaround to first get all the matches and eliminate the one that has @media. I also tried playing with negative lookaround [link](http://www.regular-expressions.info/lookaround.html) with no luck. — user3493914, Apr 04 '14 at 13:16
However above workaround does not give me a good feeling and would like to use a cleaner regex instead. — user3493914, Apr 04 '14 at 13:17
Yeah, I also didn't have luck with the negative lookahead. You could post your workaround as an edit if you think that will help. Perhaps in codereview? — Daniel Kotin, Apr 04 '14 at 15:04
it's not possible to have a foolproof regex to parse CSS. You'll run into issues with string values and comments. Better to use an actual parser than use regex. — zzzzBov, Apr 08 '14 at 18:20

Roberto Reale · Answer 1 · 2014-04-10T11:15:18.873

By using negative look-behind we obtain a more elegant solution. I'd write something of the form:

((?:(?<!@media).)*?){(.*?)}

Or, expanding:

(                // start 1st group
  (?:            // start non-capturing group (complex expression)
    (?<!@media)  // match if not preceded by @media
    .            // now match any character
  )*?            // any number of times
)                // end of 1st group
{                // match literal {
(                // start 2nd group
  .              // any character
  *?             // any number of times
)                // end of 2nd group
}                // match literal }

Have a look at https://www.debuggex.com/r/QgjgoymphZ1Ska25.

Note: feel free to add escaping as needed...

Thanks for the reply. Unfortunately it only ignores word media...not the whole media tag match :( — user3493914, Apr 10 '14 at 10:51

Parse CSS classes however ignore @media queries

1 Answers1