1

This is my unicode String >

Désastres

The above String need to be converted to HTML Entity (Hex) as

Désastres

Below is the code, it converts the string to html entiry but in Decimal.

Can anyone help me to get the desired result?

static string EscapeAccentsToHtmlEntities(string source)
{
    int length = source.Length;
    var escaped = new StringBuilder();

    for (int i = 0; i < length; i++)
    {
        char ch = source[i];

        if ((ch >= '\x00a0') && (ch < 'Ā')) //U+{0:X4} 
        {
            escaped.AppendFormat("&#{0};", ((int)ch).ToString(NumberFormatInfo.InvariantInfo)); //"&#{0};"
        }
        else
        {
            escaped.Append(ch);
        }
    }

    return escaped.ToString();
}

Explaination: possible duplicates of this is for javascript / jquery

Community
  • 1
  • 1
Karthick Gunasekaran
  • 2,697
  • 1
  • 15
  • 25
  • @mplungjan Except that it's a completely different programming language? – Nyerguds Apr 22 '16 at 09:54
  • Ah, Missed the C# since it was tagged HTML - I never see C# questions. - Seems there are however a few answers here for C# too http://stackoverflow.com/questions/3170523/converting-unicode-character-to-a-single-hexadecimal-value-in-c-sharp – mplungjan Apr 22 '16 at 09:55
  • @mplungjan Yep, true. Still pretty duplicate, I guess. – Nyerguds Apr 22 '16 at 10:01

2 Answers2

2

Add reference to System.Web to your project and use this method:

using System.Web;
using System.Text.RegularExpressions, 

private string HtmlEntityHex(string strToReplace)
{
    string strReplaced = HttpUtility.HtmlEncode(strToReplace);
    MatchCollection xMatches = Regex.Matches(strReplaced, @"&#(\d+);");
    foreach (Match xMatch in xMatches)
    {
        strReplaced = strReplaced.Replace(xMatch.Groups[0].Value.ToString(), "&#" + int.Parse(xMatch.Groups[1].Value).ToString("X").PadLeft(4, '0') + ";");
    }
    return strReplaced;
}
Taha Paksu
  • 15,371
  • 2
  • 44
  • 78
0

You just need to use the correct ToString() format for the integer:

escaped.AppendFormat("&#x{0};", ((int)ch).ToString("X4"));
Nyerguds
  • 5,360
  • 1
  • 31
  • 63
  • but it converts to &#x00E9; why? – Karthick Gunasekaran Apr 22 '16 at 11:14
  • Probably because you do an additional HTML escaping on the final string somehow? You need to put that in this function too, in the `else` case, and then leave the string alone. I can't tell you how/where that happens without more of your code though. If you use an xml writer of some kind, be sure to make it write that content as already being HTML to avoid that kind of stuff. – Nyerguds Apr 22 '16 at 11:16