Classic ASP gremlims, getting a Â inserted into text whenever an HTML special character is used

Question

I'm working on an older classic ASP site, and there's a form that allows the user to enter some text (into a multiline textbox), and if they add an html character like ® (register trademark) it inserts it correctly. But when they go to edit the data, using the same form, the update will add a random 'Â' (circumflex accent) in front of the registered trademark. The content type is utf-8.

Any ideas?

Thanks for any time you give this. It's been driving me nuts. -m

AnthonyWJones · Answer 1 · 2008-12-09T11:08:48.230

The fundemental problem is the impact of Response.Codepage on Form Posts.

When you send a form to a client specifying that the content is encoded as UTF-8, the browser will assume that the content of form posts should be sent encoded as UTF-8.

Now the action page that receives the post will (somewhat counter-intuatively) use the value of Response.Codepage to inform it how the characters in the post are encoded. This isn't obvious because we tend to think its the job of the sender to define the encoding of what its sending. Also it isn't a natural leap to think that a property to do with the encoding of what we want to send in our response would have anything to do with how the initial a request is received. In this case it does.

Whats happening is your form is posting a UTF-8 encoded version of the character but the page that receives does not have its Response.Codepage set to 65001 (the UTF-8 codepage). Its probably set to the systems OEM codepage like 1252. Hence the UTF-8 encoding for the character gets interpreted as two individual characters.

My recommendations for good character handling in ASP are:-

Save all pages as UTF-8
Include <%@ codepage=65001 at the top of all pages
Include <% Response.CharSet = "UTF-8" %> at the top all pages
Store posted data in a unicode field type such as SQL Servers NVARCHAR type.

The important thing here is that before you read form values in an ASP page you need to make sure that the Response.Codepage is set to a codepage that matches the senders encoding and this doesn't happen automatically.

I have to chime in too that the bulleted points above rescued me from who knows how many hours of going down the hair brained path of stubborn self-disgust at not being able to work out why **™** won't stick when saved. My first search yielded this solution. Many thanks. — Jerry Of Perth, Jun 20 '13 at 07:30

James Curran · Answer 2 · 2009-02-06T12:40:16.047

2

I'm gonna guess that the editor you are using doesn't work with UTF-8, and is converting everything to ASCII.

The simple answer is to stop using special characters in HTML pages. The copyright symbol should be written as © or ©.

edited Feb 06 '09 at 12:40

answered Dec 08 '08 at 18:45

James Curran

101,701
37
181
258

Pardon the quibble, but it can't be reading the text as ASCII because ASCII doesn't support accented letters or the copyright symbol. It has to be using an eight-bit encoding like ISO-8859-1 or windows-1252. – Alan Moore Feb 05 '09 at 03:13

score 1 · Answer 3 · answered Dec 08 '08 at 18:48

From my experience with this exact problem, I found that these characters popped up alot because 1) The user was using a non-English character set (and keyboard) when the content was entered (i.e. Spanish), and 2) The content was not converted to UTF-8. You're on the right track, checking the content type in the header, but you really have to run the content through a converter, as well, if this keeps happening. This problem caused me hours of pain, many years ago, with Classic ASP (I wish I still had access to the code to be of further help).

score 0 · Answer 4 · edited May 23 '17 at 12:13

Â® is what ® looks like if it's stored as UTF-8, but displayed as ASCII/ISO-8859-1/Windows-1252. Using the meta tag is not enough to make sure your page is being served as UTF-8. You will also need to set the encoding in the Content-Type HTTP header. This header is typically set either with some server-wide setting or programatically.

I don't know ASP, but this seems to be how you should set that header:

HtmlEncode UTF-8

And this might provide some more information:

http://technet.microsoft.com/en-us/library/bb742422.aspx#EBAA

If your data is stored in a database, you'll also need to make sure the data is either stored in UTF-8 as well, or converted when storing and retrieving it.

Classic ASP gremlims, getting a Â inserted into text whenever an HTML special character is used

4 Answers4

Linked