0

I am not very well-versed in regular expressions, but I am trying to accomplish something in ASP.Net which I think requires them.

I am pulling in an HTML file, doing some processing, and outputting new "merged" html. The portion I am struggling with is grabbing a chunk of code located between two predefined "tags" of my own creation.

Here is an example of the relevant input html:

<table style="width: 500px; font-family: Trebuchet MS, sans-serif; font-size: 13px; background-color: #fff; border: 0; border-collapse: collapse;" align="center" cellspacing="0">
<thead>
<tr>
<th colspan="3" style="text-align: left;border-bottom: 1px solid #DDDDDD;">
Add-ons
</th>
</tr>
</thead>
<tbody>
[AddonsListSTART]
<tr style="border-bottom: 1px dashed #DDDDDD;">
<td>[AddonName]</td>
<td>[AddonQty]</td>
<td align="right">[AddOnPrice]</td>
</tr>
[AddonsListEND]
</tbody>
</table>
<br />

This is my C# code:

//Find Add-ons HTML : between [AddonsListSTART] & [AddonsListEND]
Regex rgxAddonSE = new Regex(@"\[AddonsListSTART\](?<MyHtml>.*)\[AddonsListEND\]");

Match matchAddonSE  = rgxAddonSE.Match(htmlEmail);

string htmlAddons = matchAddonSE.ToString();

What I want to happen is for "htmlAddons" to be equal to the string:

<tr style="border-bottom: 1px dashed #DDDDDD;">
<td>[AddonName]</td>
<td>[AddonQty]</td>
<td align="right">[AddOnPrice]</td>
</tr>

The problem is that it is always blank, and "matchAddonSE.Success" is always FALSE. I know there is something wrong with my regex, but I can't figure out what.

Thank you in advance for any help.

Heather

  • Aha! A helpful link displayed in the sidebar lead me to the answer: http://stackoverflow.com/questions/4000508/regex-expression-that-will-capture-everything-between-two-characters-including-m - this regex now works: `new Regex(@"\[AddonsListSTART\](?[\s\S]*)\[AddonsListEND\]")` – H Floyd Feb 09 '12 at 22:04

2 Answers2

0

I think it may be related to multi-line/single-line processing. Consider http://msdn.microsoft.com/en-us/library/yd1hzczs.aspx#Singleline

Lotfi
  • 1,205
  • 8
  • 18
0

The problem is that .* does not include new line. regex in such predefined labels that appear once in text (expecting single match) might not be the best way to go, why not just find IndexOf and use substring.

If you still want to use regex add \r\n meaning [.\r\n]* using \s\S will give you pretty much the same as

\s is Equivalent to [ \f\n\r\t\v].

\S is Equivalent to [^ \f\n\r\t\v].

another option would be to set regex matches to Single-line Mode. (name is confusing but it acctually means it allows dot "." to grab new lines)

below is a substring usage example.

String startTag = "[AddonsListSTART]";
String endTag = "[AddonsListEND]"
int start = htmlEmail.IndexOf(startTag );
int end = htmlEmail.IndexOf(endTag);
String res ="";
if((start>=0) && (end>=0)){
  res = htmlEmail.substring(start + startTag.length,end - (start + startTag.length));
}

here is a single line mode usage : (note RegexOptions.Singleline )

//Find Add-ons HTML : between [AddonsListSTART] & [AddonsListEND]
Regex rgxAddonSE = new Regex(@"\[AddonsListSTART\](?<MyHtml>.*)\[AddonsListEND\]", RegexOptions.Singleline);

Match matchAddonSE  = rgxAddonSE.Match(htmlEmail);

string htmlAddons = matchAddonSE.ToString();

same thing except using the single line mode from within pattern

Regex rgxAddonSE = new Regex(@"(?s)\[AddonsListSTART\](?<MyHtml>.*)\[AddonsListEND\]");
james
  • 1,758
  • 1
  • 16
  • 26