8

So I was having an issue with converting French characters correctly. Basically, I have a form which sends data to an SQL Database. Then, on another page, data from this DB is retrieved and displayed to the user. But the data (strings) were being displayed with wierd corrupt characters because the input in the form on the other page was in French. I overcame this problem by using the following function which converters a string to the correct charset. HOWEVER, obviously the better solution is to convert it FIRST and then send it to the database. Now here's the code to convert a string retrieved from a DB to the appropriate charset:

Function ConvertFromUTF8(sIn)

    Dim oIn: Set oIn = CreateObject("ADODB.Stream")

    oIn.Open
    oIn.CharSet = "WIndows-1252"
    oIn.WriteText sIn
    oIn.Position = 0
    oIn.CharSet = "UTF-8"
    ConvertFromUTF8 = oIn.ReadText
    oIn.Close

End Function

I got this function from here: Classic ASP - How to convert a UTF-8 string to UCS-2?

Now my question is, what function do I use to convert strings beforehand and then send them to the database, so that when I retrieve them they will be good-to-go?

Tried Paul's Method:

So there's page 1, and page 2. Page 1 contains a form which, when submitted, sends the string to the DB which is then retrieved in page 2. I tried Paul's solution by removing the function ConvertFromUTF8 and leaving it to as it was before (it returned wierd mangolian characters). After that, I added the following line on top of Page 1 as well as Page 2.

<%@LANGUAGE="VBSCRIPT" CODEPAGE="65001"%>

I also have the following on both of the pages:

Response.CodePage = 65001 
Response.CharSet = "UTF-8" 

But it didn't work :(

Edit: it works!, thank you so much everyone for your help! All I needed to do was add "CodePage = 65001" on top of Page 3 (which I didn't even talk about), where the writing to the DB part was happening.

Community
  • 1
  • 1
user1744228
  • 153
  • 2
  • 6
  • 14
  • Do you actually need to do the conversion? The standard procedure these days is to just use utf-8 encoding for your input form and utf-8 for your output page. There are loads of questions on this site on this issue, and a very useful blog article here. http://www.hanselman.com/blog/InternationalizationAndClassicASP.aspx – John Feb 19 '14 at 00:53
  • @John Actually the `` tag and the `Charset` declaration in the `Content-Type` response header are superfluous. Browsers default to UTF-8 when no other information is given. However, setting `Session.CodePage = 65001` will be necessary, too. – Tomalak Feb 19 '14 at 08:32
  • if you really want to go this strange way then just use your function to convert the posted strings and then save them in your db – ulluoink Feb 20 '14 at 15:11
  • Your form needs to be processing `UTF-8` not `Windows-1252` in the first place, once you do this your characters will stay consistent from input to database and output again. Use @Paul [suggestion](http://stackoverflow.com/a/21873977/692942). **Keep in mind:** 1. Your asp page needs to be saved as `UTF-8` not just have the declarations. 2. You will need to specify at the top of your page(s) `<%@LANGUAGE="VBSCRIPT" CODEPAGE="65001"%>` 3. Use `Response.CodePage = 65001` and `Response.Charset = "UTF-8"` to tell the server to return strings as `UTF-8` and tell the browser to use `UTF-8` encoding. – user692942 Feb 20 '14 at 15:26
  • @Lankymart I tried the above but it didn't work (Question edit: see "Tried Paul's Method). How do I make sure the asp page saves as UTF-8? – user1744228 Feb 20 '14 at 16:42
  • @user1744228 It depends what you're using `notepad`, `visual studio` etc. In the case of `Visual Studio` it has a hidden menu option that you have to go find and enable called `"Advanced Save Options"`. – user692942 Feb 20 '14 at 16:51
  • possible duplicate of [Classic ASP - How to convert a UTF-8 string to UCS-2?](http://stackoverflow.com/questions/916118/classic-asp-how-to-convert-a-utf-8-string-to-ucs-2) – IsmailS Mar 02 '15 at 09:18
  • @IsmailS not the same, that is requesting encoding of a specific string not the entire page. Using that approach in this circumstance is incorrect. – user692942 Jan 25 '17 at 08:28

2 Answers2

13

Paul's answer isn't wrong but it is not the only part to consider:

You will need to go through each of these steps to make sure that you are getting consistent results;

IMPORTANT: These steps have to be performed on each and every page in your web application or you will have problems (emphasized by Paul's comment).

  1. Each page needs to be saved using UTF-8 encoding double check this as some IDEs will default to Windows-1252 (also often misnamed as "ANSI").

  2. Each page will need the following line added as the very first line in the page, to make this easier I put this along with some other values in an include file so I can include them in each page as I go.

    Include File - page_encoding.asp
    <%@Language="VBScript" CodePage = 65001 %>
    <% 
      Response.CharSet = "UTF-8"
      Response.CodePage = 65001
    %>
    

    Usage in the top of an ASP page (prefer to put in a config folder at the root of the web)

    <!-- #include virtual="/config/page_encoding.asp" -->
    

    Response.Charset = "UTF-8" is the equivalent of setting the ;charset in the HTTP content-type header. Response.CodePage = 65001 tell's ASP to process all dynamic strings as UTF-8.

  3. Include files in the page will also have to be saved using UTF-8 encoding (double check these also).

Follow these steps and your page will work, your problem at the moment is some pages are being interpreted as Windows-1252 while others are being treated as UTF-8 and you're ending up with a mis-match in encoding.

user692942
  • 16,398
  • 7
  • 76
  • 175
  • Thank you so much! All I needed to do was add "CodePage = 65001" on top of Page 3 (which I didn't even talk about), where the writing to the DB part was happening. – user1744228 Feb 20 '14 at 17:02
  • 1
    @user1744228 Can I suggest as your new you have a quick read of [What should I do when someone answers my question?](http://stackoverflow.com/help/someone-answers) how you vote / accept an answer is up to you. Hope it's been helpful. Be careful to maintain the process just adding `CodePage = 65001` will eventually lead to more problems, follow my steps and you can't go wrong. – user692942 Feb 20 '14 at 17:04
  • 1
    @Lankymart Since your answer is more complete then mine answer: may I suggest that you emphasize each AND every page in your web-app, forget one page and you have a problem. – Paul Feb 20 '14 at 17:13
  • @Paul +1 because end of the day your suggestion was sound. – user692942 Feb 20 '14 at 17:15
6

Normally - and that word has a veryyyyy long stretch - you do not need to convert on hand, even more it's discouraged. At the top off your asp page you write:

<%@LANGUAGE="VBSCRIPT" CODEPAGE="65001"%>

that tell's ASP to send and to receive (from a server point of view) UTF-8. Furthermore it instructs the interpreter to use 2 byte strings. So when writing to a database or reading from a database everything goes auto-magically, so if your database uses 1 byte char or 2 byte nchar conversions are taken care of. And actually that's about it. You can test if all goes well by testing with this set:

áäÇçéčëíďńóöçÖöÚü

This set contains some 'European' but also some 'Unicode' chars... those Unicode will always fail if you use codepage 1252, so it's a nice test set.

Paul
  • 1,068
  • 11
  • 29
  • 1
    No, the browser does not need a `` tag to recognize encoding. In fact you should not put it in at all. The meta tag is a crude way to override the `Content-Type:` response header (it's called `http-equiv` for a reason). Putting in the meta tag just opens one more spot for potentially conflicting information (`Content-Type:` header vs. meta tag vs. browser auto charset detection). Just leave off the meta tag and control everything through `Content-Type:`. – Tomalak Feb 19 '14 at 08:38
  • @Tomalak, you are so right, I've edited the answer to reflect your comment. – Paul Feb 19 '14 at 08:47
  • I tried Paul's method, but it didn't work. I've tried all kinds of stuff, like meta tags, response.codepage, response.charset,..etc. But none of them work. The only thing that works is using a function to convert strings. So all I need is that function I can use before sending strings to the DB – user1744228 Feb 19 '14 at 14:47
  • @user1744228 You really need to update your original question and show us how you tried Paul's suggestion, encoding can be a nightmare if you're not familiar with how it works and it is very easy to get wrong. But forcibly passing your data through a `ConvertFromUTF8()` function is horrendously inefficient when the server can do this for you. – user692942 Feb 20 '14 at 15:19
  • 1
    The example set only includes ISO-8859-1 characters, not anything that would require 'Unicode'. I use the following, which has ISO-8859-1, extras from Windows-1252, and two that require Unicode: ISO äàáâãåæçÿ WIN € ‘–’ “—” Ÿ UTF Łł END – stevek_mcc Jan 20 '15 at 13:31
  • @stevek_mcc you are right, somewhere along the line I missed that wrong copy paste, so I've adjusted the chars to those intended. – Paul Apr 18 '15 at 09:55