The fundemental problem is the impact of Response.Codepage on Form Posts.
When you send a form to a client specifying that the content is encoded as UTF-8, the browser will assume that the content of form posts should be sent encoded as UTF-8.
Now the action page that receives the post will (somewhat counter-intuatively) use the value of Response.Codepage
to inform it how the characters in the post are encoded. This isn't obvious because we tend to think its the job of the sender to define the encoding of what its sending. Also it isn't a natural leap to think that a property to do with the encoding of what we want to send in our response would have anything to do with how the initial a request is received. In this case it does.
Whats happening is your form is posting a UTF-8 encoded version of the character but the page that receives does not have its Response.Codepage set to 65001 (the UTF-8 codepage). Its probably set to the systems OEM codepage like 1252. Hence the UTF-8 encoding for the character gets interpreted as two individual characters.
My recommendations for good character handling in ASP are:-
- Save all pages as UTF-8
- Include <%@ codepage=65001 at the top of all pages
- Include <% Response.CharSet = "UTF-8" %> at the top all pages
- Store posted data in a unicode field type such as SQL Servers NVARCHAR type.
The important thing here is that before you read form values in an ASP page you need to make sure that the Response.Codepage is set to a codepage that matches the senders encoding and this doesn't happen automatically.