0

How can I escape html codes in Regex?

I need to find the string

&

in a string like

this is my string & this is another string

I can not use HtmlEncode/Decode for this purpose cause i need work with tags. That i want i just find the common string.

I use this, and work for example with "another" or "my" but doesn't work with "&".

            Regex regularextest = new Regex("\b&\b", options);
            string RSTest = "char $& morechar";
            string lalala = regularextest.Replace("foo & bar", RSTest);

It's very frustrating because google replaces the string with an & or "AND" word.

Thanks in advance

Leandro Bardelli
  • 10,561
  • 15
  • 79
  • 116
  • 2
    Why do you need to use regex as opposed to the normal `.Replace()` on the string? – recursive Nov 23 '11 at 18:54
  • In addition, in the MSDN says that & and ; are not special chars for Regex in C# – Leandro Bardelli Nov 23 '11 at 18:54
  • See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 I know it's not an exact duplicate, but it answers your question. Parsing escaped html faces the same limitations. – David Nov 23 '11 at 18:54
  • 1
    I don't understand why you say "I can not use HtmlEncode/Decode ... cause i need work with tags" - could you elaborate? – jwd Nov 23 '11 at 18:55
  • Because the logic of the code, the regular expression it's so much complicated, i reduce it to do the example and find the solution – Leandro Bardelli Nov 23 '11 at 18:56
  • @jwd - Perhaps ***I'm*** the one who misread the question. ;-) – David Nov 23 '11 at 18:57

2 Answers2

2

This \b&\b will not match because & and ; are not word characters.

You could try this :

Regex regularextest = new Regex("(?<=^|\s+)&amp;(?=\s+|$)", options);
FailedDev
  • 26,680
  • 9
  • 53
  • 73
0

If you need to identify/convert valid entities (non-unicode), you could use this regex
(?:&(?:[A-Za-z_:][\w:.-]*|\#(?:[0-9]+|x[0-9a-fA-F]+));)
to identify a possible value to replace, pass it to a callback function that further process the entity you wish to replace. This way it could all be done in a single regex global replace (with callback logic).