1

I found there is a bug in this highlight editor: http://cshe.ds4a.com/

The following ASP.Net code can't be highlighted correctly

<%@ Page Title="<%$ Resources: XXX %>" Language="C#" ContentType="text/html" ResponseEncoding="utf-8" %>

The problem is about the regular expression, how can I find this whole line by regular expression?

I am using the RegExp from ActionScript3

The main challenges are:

  1. The <%@ %> instruction may contains another <%$ %> instruction in its attribute, just like the one above

  2. The <%@ %> instruction may have a line break in it, just like the following.

<%@ Page Title="<%$ Resources: XXX %>"
Language="C#" ContentType="text/html" ResponseEncoding="utf-8"
 %>

3 . The <%@ %> instruction may followed by another <%@ %> without any space / line-break

<%@ Page Title="<%$ Resources: XXX %>"
Language="C#" ContentType="text/html" ResponseEncoding="utf-8"
 %><%@ Import Namespace="System" %>

Thank you

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Jerry
  • 23
  • 4
  • What language are you using? RegEx flavors behave differently and have different features, so knowing this will help. Please edit and tag the question with the language you are using. – Oded Sep 05 '10 at 09:32
  • This is a classic example of where RegExs fall down - they're not designed to handle nested expressions like this. Really you'd need a proper ASP.Net parser to do this properly. See also http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – John Carter Sep 05 '10 at 09:42
  • Can you try and post the original regex? Otherwise this is just guess work. – FK82 Sep 05 '10 at 09:44

3 Answers3

2

I'm not sure all these escapes are necessary, but I kept them for good meassure. This found your line in notepad++ find

^<\%\@.*\%>$

EDIT

For multiple lines, set the multiline and dotall flags. Those inform that the expression should span over several lines, and that the . wildcard should match newline (\n).

/<\%\@.*\%>/sm

or

<\%\@.*\%>

With s and m flags.

Hubro
  • 56,214
  • 69
  • 228
  • 381
  • Thanks, but actually this line can have a line-break in it, just like <%@ Page Title="<%$ Resources: XXX %>" Language="C#" ContentType="text/html" ResponseEncoding="utf-8" %><%@ Import Namespace="System.Data" %>, I don't think your expression will works :) – Jerry Sep 05 '10 at 09:46
  • Then you have to set the multi line flag. Depending on your software, usually that's done by adding "n" behind the end delimiter. – Hubro Sep 05 '10 at 09:51
  • That is not just about multi-line switch, I am using thr RegExp from ActionScript3, The <%@ %> instruction may contain line break; The instruction attribute may contain another <%$ %>, and <%@ %> may followed by another <%@ %> without any space / line-break – Jerry Sep 05 '10 at 09:57
  • 1
    Then you should consider trying a programmatical solution, using more than one regex if necessary. – Hubro Sep 05 '10 at 09:59
  • @Codemonkey: The multiline flag is irrelevant; all it does is change the behavior of the `^` and `$` anchors, which aren't being used here. – Alan Moore Sep 05 '10 at 11:43
1

Try this:

/<%@[^%"']++(?:(?:%(?!>)|"[^"]*+"|'[^']*+')[^%"']++)*+%>/

Anything that's enclosed in double-quotes or single-quotes is treated as generic string content, so a %> in an attribute value won't prematurely close the tag for matching purposes.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
0

Based on the headline I created a little RegEx which also takes care of whitepasce at the start or end of the file. However I can not assure if this fits into your project.

^\s*<%@.*>\s*$

I tested this with the PHP function preg_match_all()

Update:

Use this pattern for a across multiple lines. Your RegExLibrary has to support the parameter "s" (which accepts newlines as character) though

/\s*<%@.*>\s*/s
Alex
  • 5,240
  • 1
  • 31
  • 38