1

I got a string from html like:

var htmlStr = " \\x26lt;span\\x26gt; \\x26lt;/span\\x26gt;";

I can't decode it to C# string like:

 <span> </span>

If I modify the string to

var htmlStr = " \x26lt;span\x26gt; \x26lt;/span\x26gt;";

It works good. But how can I do it by replacing string or others way?

BTW, I use Encoding.UTF8.

Heinzi
  • 167,459
  • 57
  • 363
  • 519
MichaelMao
  • 2,596
  • 2
  • 23
  • 54

2 Answers2

4

You do it like this

var htmlStr = "\\x26lt;span\\x26gt; \\x26lt;/span\\x26gt;";
// Take out the extra stars
var result = Regex.Replace(htmlStr, @"\*\*([^*]*)\*\*", "$1");   
// Unescape \x values
result = Regex.Replace(htmlStr,
                @"\\x([a-fA-F0-9]{2})", 
                match => char.ConvertFromUtf32(
                    Int32.Parse(match.Groups[1].Value, 
                    System.Globalization.NumberStyles.HexNumber)));
// Decode html entities
htmlStr = WebUtility.HtmlDecode(result);

The Output is

<span> </span>
Mohit S
  • 13,723
  • 6
  • 34
  • 69
  • Hi, @MohitShrivastava this work fine, I can solve this problem by REG. But anyone know the decode problem? Always write reg is not a good solution right? – MichaelMao Dec 30 '15 at 08:42
1

Already answered here: How can I decode HTML characters in C#?

In short, you can use HttpUtility.HtmlDecode or WebUtility.HtmlDecode

Community
  • 1
  • 1
eocron
  • 6,885
  • 1
  • 21
  • 50