0

I have a Oracle server with a DAD defined with PlsqlNLSLanguage DANISH_DENMARK.WE8ISO8859P1.

I also have a JavaScript file that is loaded in the browser. The JavaScript file contains the danish letters æøå. When the js file is saved as UTF8 the danish letters are misencoded. When I save js file as UTF8-BOM or ANSI then the letters are shown correctly.

I am not sure what is wrong.

1 Answers1

0

Try to set your DAD

PlsqlNLSLanguage DANISH_DENMARK.UTF8

or even better

PlsqlNLSLanguage DANISH_DENMARK.AL32UTF8

When you save your file as ANSI it typically means "Windows Codepage 1252" on Western Windows, see column "ANSI codepage" at National Language Support (NLS) API Reference. CP1252 is very similar to ISO-8859-1, see ISO 8859-1 vs. Windows-1252 (it is the German Wikipedia, however that table shows the differences much better than the English Wikipedia). Hence for a 100% correct setting you would have to set PlsqlNLSLanguage DANISH_DENMARK.WE8MSWIN1252.

Now, why do you get correct characters when you save your file as UTF8-BOM, although there is a mismatch with .WE8ISO8859P1?

When the browser opens the file it first reads the BOM 0xEF,0xBB,0xBF and assumes the file encoded as UTF-8. However, this may fail in some circumstances, e.g. when you insert text from a input field to database.

With PlsqlNLSLanguage DANISH_DENMARK.AL32UTF8 you tell the Oracle Database: "The web-server uses UTF-8." No more, no less (in terms of character set encoding). So, when your database uses character set WE8ISO8859P1 then the Oracle driver knows he has to convert ISO-8859-1 characters coming from database to UTF-8 for the browser - and vice versa.

Wernfried Domscheit
  • 54,457
  • 9
  • 76
  • 110
  • I did try to set the DAD and it worked. I just don't understand why. The Oracle docs states that `WE8ISO8859P1` supports Danish https://docs.oracle.com/database/121/NLSPG/ch2charset.htm#NLSPG164 – An Van Luu Aug 29 '16 at 09:36
  • See my update where I try to explain. `PlsqlNLSLanguage .AL32UTF8` does **not** mean the **database** character set. It specifies the character set of the client, here the web-server. See also [NLS_LANG and others](http://stackoverflow.com/questions/33783902/odbcconnection-returning-chinese-characters-as/33790600#33790600) which works in the same way. – Wernfried Domscheit Aug 29 '16 at 09:43
  • If I change the DAD to `PlsqlNLSLanguage DANISH_DENMARK.AL32UTF8` and save the **js file** as `ANSI` then the danish letters are misencoded. Should it not be compatible? Is it recommended to set the server as `PlsqlNLSLanguage DANISH_DENMARK.AL32UTF8` and save all files as UTF8? – An Van Luu Aug 29 '16 at 11:50
  • UTF-8 (i.e. AL32UTF8) is compatible to ANSI (i.e. CP1252) only for first 127 characters (i.e. U+0000 to U+007F). Characters above do not match. Yes, nowadays you should work with UTF-8 wherever you can. – Wernfried Domscheit Aug 29 '16 at 14:12
  • Im sorry, but I still dont understand why the letters **æ ø å** are misencoded. [this](http://www.w3schools.com/charsets/ref_html_ansi.asp) shows that the letters are supported. In any case, it helped by changing everything to UTF-8 – An Van Luu Aug 30 '16 at 07:11
  • Yes, but for example **æ** is `xC6` at CP1252 and ISO-8859-1 but at UTF-8 it is `xC3 xA6`, i.e. they are represented by different byte values. – Wernfried Domscheit Aug 30 '16 at 08:04