4

This is the code :

Response.Write("asd1 X : " + HttpUtility.HtmlEncode("×"));
Response.Write("asd2 X : " + HttpUtility.HtmlEncode("✖"));

The fist one is :

asd1 X : × // OK, ENCODED AS HTML ENTITIES

the second no, just ✖ :

asd2 X : ✖

which kind of char is that? Also, if I try here the result is :

asd1 X : ×
asd2 X : ✖

What?? Why this differences?

markzzz
  • 47,390
  • 120
  • 299
  • 507
  • Is the character UTF-8, or Windows 1251? – Diodeus - James MacFarlane Jun 19 '12 at 15:53
  • Uhm...but entities could be universal, no matter about the charset, am I wrong? – markzzz Jun 19 '12 at 15:55
  • Looks like Unicode character [2716](http://www.fileformat.info/info/unicode/char/2716/index.htm) – Oded Jun 19 '12 at 15:56
  • 1
    The OUTPUT could be universal, but how does the function know whether the input is UTF-8 or Win-1251? – Diodeus - James MacFarlane Jun 19 '12 at 16:00
  • That's a right question. In fact : how can I know it? I copied/pasted from a website...I think it will also copy the charset...uhm... – markzzz Jun 19 '12 at 16:02
  • @markzzz, entities are universal, but not all utilities handles all range of entities. Appearly `HttpUtility` doesn't. But if you try `Microsoft.Security.Application.AntiXss.HtmlEncode`, you'll get the wanted result. And here's a SO talks about the two. http://stackoverflow.com/questions/1608854/what-is-the-difference-between-antixss-htmlencode-and-httputility-htmlencode – Ray Cheng Jun 19 '12 at 16:44
  • Why do you need it to be encoded as a numeric entity reference? – Oded Jun 19 '12 at 16:56

2 Answers2

7

In the MSDN page for HttpUtility.HtmlEncode(string), you will find this comment:

It encodes all character codes from decimal 160 to 255 (both inclusive) to their numerical entity (e.g.  )

× (×) is the same as × / × on my computer, so will get encoded, but since is ✖ / ✖, it will not be.

You can use the overload of HtmlEncode that takes a TextWriter based on the wanted Encoding.

Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • Using the overloaded method doesn't produce a HTML entity. It just outputs the big X. using (TextWriter tw = new StreamWriter(@"c:\temp\test.txt")){HttpUtility.HtmlEncode("✖", tw);} – Ray Cheng Jun 19 '12 at 16:38
  • @RayCheng - Why are you expecting a numeric entity reference? Why do you need it? – Oded Jun 19 '12 at 16:42
  • I think the OP's intend is trying to get the HTML entity. But with `HttpUtility.HtmlEncode`, it's not possible for that particular character because of the limitation. So the overloaded method still does not provide the wanted result. – Ray Cheng Jun 19 '12 at 16:49
  • @RayCheng - I wouldn't expect it to either. – Oded Jun 19 '12 at 16:51
2

My best guest is that not all strings has a entity representation. The Heavy multiplication X is just one of the many that don't.

To elaborate Oded's link, HttpUtility.HtmlEncode only encodes characters in ISO 8859-1 (Latin-1). Since the Heavy Multiplication X is out of this range, the function doesn't handle it.

If you try Microsoft.Security.Application.AntiXss.HtmlEncode("✖");, you'll get the HTML entity in ✖.

Ray Cheng
  • 12,230
  • 14
  • 74
  • 137