I am trying to replace a string in HTML. I am only interested in "true" text (textContent
). That is, no attributes should be touched, just the text.
I came up with an expression that is not perfect yet:
var hayStack = @"<p class='33333'> 33333 <a href='33333'> 33333 </a> After 33333 </p> <div id='33333'></div>";
string pattern = @"(?x)(?<=>.*?) 33333 (?=.*?<)";
Console.WriteLine(Regex.Replace(hayStack, pattern, "Replaced"));
That prints:
<p class='33333'> Replaced <a href='Replaced'> Replaced </a> After Replaced </p> <div id='Replaced'></div>
It appears that the expression works correctly in some cases. It does handle text content, but it breaks when dealing with attributes.
It should print:
<p class='33333'> Replaced <a href='33333'> Replaced </a> After Replaced </p> <div id='33333'></div>
How would the correct expression look?