3

I have a textarea where I type some unicode characters which become question marks by the time the string reaches the server. On the input I typed the following:

Don’t “quote” me on that.

On the server I checked Request.Form["fieldID"] in Page_Load() and I saw:

"Don�t �quote� me on that."

I checked my web.config file and it says <globalization requestEncoding="utf-8" responseEncoding="utf-8" />. Anything else I should check to ensure UTF-8 is enabled?

styfle
  • 22,361
  • 27
  • 86
  • 128
  • When you say that you checked on the server side do you mean that you used the debugger and got "Don�t �quote� me on that." in the watch window? Could you please copy and paste the content of the string into something like MS Word and set the font to, say, Arial? I just want to rule out the possibility that something funny is happening with your fonts in Visual Studio. – Miltos Kokkonidis Nov 28 '12 at 01:29
  • Yes I used the debugger. I know it is not the fonts because converting the character to an int shows up as 65533. [This post](http://stackoverflow.com/q/5798110/266535) is related but doesn't solve my problem. – styfle Nov 28 '12 at 01:36
  • OK. Let's try something else then. Have you used Fiddler to see what actually reaches your server? http://www.fiddler2.com/fiddler2/ if you haven't got it already. – Miltos Kokkonidis Nov 28 '12 at 01:43
  • This is odd. It looks like the question marks are in the request to the server. I clicked TextView and WebForms and both say `Don�t �quote� me on that.` but the question marks appear as boxes. – styfle Nov 28 '12 at 02:00
  • As they do in my browser when I look at your question, which is why my first thought was about fonts. Let me see what happens in one of my web apps when I feed it the same input... – Miltos Kokkonidis Nov 28 '12 at 02:02
  • Oh I think I figured out the problem. It only happens when a "feature" is turned on in the WebApp. This looks like @Pheonixblade9 might be right in that the text is being encoded twice. – styfle Nov 28 '12 at 02:11
  • In my test, the string is sent to the server (using POST) as it should be:&ApplNotes=Don%E2%80%99t+%E2%80%9Cquote%E2%80%9D+me+on+that. Is that what you are getting also? (Use the Raw view. The posted data are on the bottom line and are URL encoded.) – Miltos Kokkonidis Nov 28 '12 at 02:13
  • Great if you have figured it out. Just post a full answer, as I am curious :-) – Miltos Kokkonidis Nov 28 '12 at 02:15
  • Thanks @MiltosKokkonidis, I ended up answering my own question but your comments were really helpful. – styfle Jan 15 '13 at 22:11

3 Answers3

0

Question marks like that generally show up when UTF-8 nulls are passed.

You need to HTML encode your strings.

Community
  • 1
  • 1
Codeman
  • 12,157
  • 10
  • 53
  • 91
  • But this is coming straight from the client's request, not from the server. Can you provide an example? – styfle Nov 28 '12 at 01:24
  • So you receive a request with these invalid characters? You control the webpage these data fields are on, correct? – Codeman Nov 28 '12 at 01:24
  • I don't know what else to post. That's why my question is asking where to look next. Something is happening before Page_Load. – styfle Nov 28 '12 at 01:40
  • I think you might be right in that the text is being encoded twice. – styfle Nov 28 '12 at 02:11
0

Check the encoding of the Page where the form is, and/or the accept-charset of the form.

I can replicate what you are seeing with ISO-8859-1 - e.g.

<form action="foo" method="post" accept-charset="ISO-8859-1">
   ....
</form>

In VS watch window:


Inspecting Request.Form (before accessing the key itself):

message=Don%ufffdt+%ufffdquote%ufffd+me+on+that.

Inspecting Request.Form["message"] - accessing the collection keys which means ASP.Net has already automatically urldecoded:

"Don�t �quote� me on that."

It seems something is overriding your web.config settings on that specific page (?)

Hth...

EdSF
  • 11,753
  • 6
  • 42
  • 83
0

Once I again I solve my own problem. It is quite simple. The short answer is add the following before sending any response back to the client:

Response.ContentType = "text/html; charset=utf-8";

The long answer is that a "feature" called Cache Mode circumvented all other response data by writing a UTF-8 encoded file that is really just a cached response. Adding that line before it write the file solved my problem.

if (cacheModeEnabled) {
    Response.ContentType = "text/html; charset=utf-8"; // WriteFile doesn't know the file encoding
    Response.WriteFile(Server.MapPath("CacheForm.aspx"), true);
    Response.End();
} else {
  // perform normal response here
}

Thanks for all the answers and comments. They definitely helped me solve this issue. Most notably, Fiddler2 let me see what the heck is really in the request and response.

styfle
  • 22,361
  • 27
  • 86
  • 128