63
<%@LANGUAGE="VBSCRIPT" CODEPAGE="65001"%>
<!--#include file="conn.asp"-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Is the above code right?

casperOne
  • 73,706
  • 19
  • 184
  • 253
Mask
  • 33,129
  • 48
  • 101
  • 125

4 Answers4

71

Yes.

UTF-8 is CP65001 in Windows (which is just a way of specifying UTF-8 in the legacy codepage stuff). As far as I read ASP can handle UTF-8 when specified that way.

Joey
  • 344,408
  • 85
  • 689
  • 683
  • 1
    In what way is Codepage "legacy"? – AnthonyWJones Oct 27 '09 at 09:04
  • 17
    Historically texts had a *code page* which simply specified which character set to use. Those had some number which differed from vendor to vendor, Windows seems to use a 16-bit unsigned integer for that purpose. Nowadays most encodings and character sets have *names* instead of *numbers*. I consider the fact that UTF-8 has a code page number (that is nowhere specified nor used outside Microsoft) a thing to ensure that it's still working with the old 16-bit integer code page number system. Even though UTF-8 is nothing like a code page in the first place. – Joey Oct 27 '09 at 09:17
  • @Johannes: The codepage number is still an important feature of how Windows handles character encoding. For example in .NET the Encoding class can only be instanced using the codepage number. I don't think Codepage is yet "legacy". – AnthonyWJones Oct 27 '09 at 13:28
  • 2
    It's only there for correct interoperability with previous and existing systems. Nowadays I guess such mechanisms would use names instead of arbitrary numbers simply because the encoding landscape has changed a bit since ye olde days of 1980. – Joey Oct 27 '09 at 13:46
  • Code pages are still used in Windows DOS screens. For example, to change the code page used by a DOS screen to UTF-8: chcp 65001 – Sabuncu May 05 '12 at 21:47
  • 3
    Sabuncu, (a) DOS is a misnomer for the Windows console, don't use it. (b) Switch the console window to a TrueType font and you'll get Unicode support without all the craziness. Whatever you set with chcp then doesn't affect the output of text. Besides, this question wasn't at all about the Windows console but rather about ASP. – Joey May 05 '12 at 22:39
  • 1
    @AnthonyWJones http://msdn.microsoft.com/ru-ru/library/windows/desktop/dd317756.aspx - see 2st comment, made by Microsoft employee. While i agree that codepages would live as long as Windows lives, still they were named legacy. Like 8.3 names, 260-letter paths and so on. – Arioch 'The Nov 26 '12 at 17:00
  • @AnthonyWJones: `System.Text.Encoding.GetEncoding(string)` accepts a name like `ISO-8859-1` or `UTF-32BE`. – Joey Feb 22 '13 at 10:45
  • 1
    [CP 65001 support is buggy in `cmd.exe` and MS VC Runtime](https://stackoverflow.com/questions/388490/how-to-use-unicode-characters-in-windows-command-line) but as just an encoding to read files with bare winapi, it seems to be okay. – ivan_pozdeev Jan 30 '18 at 20:29
12

Your code is correct although I prefer to set the CharSet in code rather than use the meta tag:-

<% Response.CharSet = "UTF-8" %>

The codepage 65001 does refer to the UTF-8 character set. You would need be make sure that your asp page (and any includes) are saved as UTF-8 if they contain any characters outside of the standard ASCII character set.

By specifying the CODEPAGE attribute in the <%@ block you are indicating that anything written using Response.Write should be encoded to the Codepage specified, in this case 65001 (utf-8). Its worth bearing in mind that this does not affect any static content which is sent verbatim byte for byte to the response. Hence the reason why the file needs be actually saved using the codepage that is specified.

The CharSet property of the response sets the CharSet value of the Content-Type header. This has no impact on how the content my be encoded it merely tells the client what encoding is being received. Again it is important that his value match the actual encoding sent.

AnthonyWJones
  • 187,081
  • 35
  • 232
  • 306
  • 1
    The primary meaning and effect of `<%@LANGUAGE="VBSCRIPT" CODEPAGE="65001"%>` is for the source file encoding to be UTF-8 (or whatever the codepage specified). It only cascades through to the `Response.CharSet` property. You may save your file as UTF-8 and put the matching CODEPAGE declaration in and then still use another encoding for `Response.CharSet`. Like source in 65001 and output in 1251 or 1252. - You propably know that, I just didn't think it was completely clear from your text, which begins by implying that they might be simple alternatives. – Lumi Apr 14 '12 at 08:30
  • 2
    @Lumi: I find no such implication, I quote "The CharSet property of the response sets the CharSet value of the Content-Type header. This has _no_ impact on how the content may be encoded". Seems fairly clear to me. BTW the only __actual__ effect of the CODEPAGE directive is to set the `Response.CodePage`, its the responsiblity of the developer to ensure the file is saved using the matching codepage. – AnthonyWJones Apr 14 '12 at 14:50
  • 1
    you're right. I confused `Response.CharSet` and `Response.CodePage`. Setting the CODEPAGE directive cascade to the latter, not to the former; it has no bearing at all on the `Content-Type` header. I believe the CODEPAGE directive is best understood as "source file encoding". [Here's an example of where it matters.](http://code.activestate.com/lists/activeperl/21512/) The critical expression is `domXml.createElement("Französisch")`. The file was encoded in UTF-8 (had to be Unicode for all of Greek, Russian, etc to work) and so `codepage=65001` was critical. – Lumi Apr 14 '12 at 16:04
10

Yes, 65001 is the Windows code page identifier for UTF-8, as documented on the Microsoft website. Wikipedia suggests that IBM code page 128 and SAP code page 4110 are also indicators for UTF-8.

Tim
  • 9,171
  • 33
  • 51
1
response.codepage = 65001

seem to give bad result when the physical file is saved as utf-8

Otherwise, it work as it is supposed to.

musefan
  • 47,875
  • 21
  • 135
  • 185