The problem here is you're trying to apply Regex.Unescape to something which wasn't entirely processed with Regex.Escape. The same problem would be encountered with just about any encoding where you had a message partially encoded and other parts not encoded. You can try to anticipate all the variations, but there will be cases where you will be unable to distinguish between something that was intended to be unecoded, and other things which are not escaped. The only sure fire way is to ensure the entire message is consistently encoded. This means completely decoding the message anytime you perform manipulations on the string, and then re-encoding the entire string.
Here is a demonstration I did in linqpad with output to follow for each corresponding .Dump()
. It does the full encoding and then complete decoding. You'll notice half way through the \w gets escaped when Regex Encoding. So the crux of the issue you are having is that the "some message \w+ here" part of the message was not Regex Encoded, so applying Regex.Unescape to it is going to fail because you can't unescape something that's not escaped.
string ori = @"<div>some message \w+ here</div>"; //only escaping is \\ for the C# string which is really \
ori.Dump(); // Verify that real string is "<div>some message \w+ here</div>"
string regexEscaped = System.Text.RegularExpressions.Regex.Escape(ori);
regexEscaped.Dump();
//Regex escape does not replace "<" with unicode characters as it seems an unnecesary escape sequence. I can force them into the regex encoded string
//This step is unnecesary and can be commented out.
//regexEscaped = regexEscaped.Replace(">", @"\u003e").Replace("<",@"\u003c");
//regexEscaped.Dump();
string htmlEscaped_regexEscaped = System.Web.HttpUtility.HtmlEncode(regexEscaped).Dump();
System.Text.RegularExpressions.Regex.Unescape( System.Web.HttpUtility.HtmlDecode(htmlEscaped_regexEscaped)).Dump();
// Since we encoded the entire string we were able to successfully decode it.
Output:
Original: <div>some message \w+ here</div>
Rgx Escpd: <div>some\ message\ \\w\+\ here</div>
HTML Encd: <div>some\ message\ \\w\+\ here</div>
HTML Uncd & Rgx Unesc: <div>some message \w+ here</div>
Are you using this for matching?
If your intent is to use the string "\u003cdiv\u003esome message \w+ comes here\u003c/div\u003e" as a Regex expressiong for performing matching, there is no need to do anything to it. The matcher implementing the full regex feature set should understand "\u003c" and so there is no need to attempt to convert that to "<":
http://www.regular-expressions.info/unicode.html
The client isn't really doing a Regex Escape?
It seem more likely that the client isn't really doing a regex escape, and thus Regex.Unescape is certain to fail. Is it doing some sort of Html Encode but replacing the characters with unicode codes instead of HTML character codes? Maybe. Without having documented behavior for the client, it is an educated guess and hope that they don't produce other inconsistent encodings later down the line.
In this case, I would just target the unicode escape sequences. Here is a question that covers the topic of replacing unicode escape sequences and not use Regex.Unescape:
How do convert unicode escape sequences to unicode characters in a .NET string