0

I have a large string of asp code and I want to modify some parts of it using regex. I have a pattern and want to replace all instances of it with html comments wrapped around it. I have this so far

foreach (Match controlMatch in Regex.Matches(bodyText, "<asp:Image.*?\\/>", RegexOptions.IgnoreCase | RegexOptions.Singleline))
{
  bodyText = bodyText.Replace(controlMatch.Groups[0].Value, "<!--" + controlMatch.Groups[0].Value + "-->");
}

But the problem is, when I call replace, it replaces all the other instances that I already wrapped in html comments and it ends up looking like

<!--<!--<!--<!--<asp:Image ... /> -->-->-->-->

Does anyone know how to fix this? Coincidentally the matches of the pattern happen to be the same exact string which is why this happens, but in general it can be different.

Sean Bright
  • 118,630
  • 17
  • 138
  • 146
omega
  • 40,311
  • 81
  • 251
  • 474
  • any time you are using `.*` you can't know what you're going to capture. you should use an html parser stead of using regular expressions – DLeh Jul 23 '15 at 15:10
  • For now its ok, I saw what it captures and its ok, but the issue here is how to replace each instance only once. – omega Jul 23 '15 at 15:12
  • why not use `Regex.Replace()` instead of `string`'s `Replace()`? – DLeh Jul 23 '15 at 15:13
  • I tried, but I'm not sure what to put in the replace substring parameter since its a variable of what was matched. – omega Jul 23 '15 at 15:13

2 Answers2

0

Instead of using string's Replace() method, you should use Regex.Replace(). You can use $1 to reference the first capture group in the regular expression pattern. To specify a capture group, you wrap the pattern in ()s.

var bodyText = @"
    <asp:Image asdflk;jasd;lkfjas />

    <asp:Image something else runat=""server"" />
    ";
var pattern = "(<asp:Image.*?\\/>)";
var replacementPattern = "<!-- $1 -->";
bodyText = Regex.Replace(bodyText, pattern, replacementPattern);

In this example, bodyText now contains

<!-- <asp:Image asdflk;jasd;lkfjas /> -->

<!-- <asp:Image something else runat="server" /> -->

With this, you won't need to loop through the matches, so the replacement will only be run once for each match. The current code you have replaces every time you loop.

If you want to have more intelligent replacements to ignore already-commented tags, you shouldn't be using regular expressions, you should use a more powerful HTML parser.

DLeh
  • 23,806
  • 16
  • 84
  • 128
  • But this doesn't fix the issue of commenting already commented asp image tags. – juharr Jul 23 '15 at 15:23
  • yes it does, because you won't be running the `Regex.Replace()` on the entire string many times like OP is doing. – DLeh Jul 23 '15 at 15:24
  • I guess I didn't read the question carefully enough. I was thinking the problem was occurring from running the code on a file that already had asp image tags that were commented out previously. – juharr Jul 23 '15 at 15:27
  • Yeah. If there are already comments in the code he's parsing (which is very likely) then he would be much better off using an HTML parser. But this solves the problem he stated. – DLeh Jul 23 '15 at 15:29
-1

Don't use regex to parse HTML. Regex does not care if some parts of html are already commented out. Use some parser that understands html at least a little bit. Check out this epic post:

RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
Nikolay
  • 10,752
  • 2
  • 23
  • 51