10

Is there a way to change the encoding used by the String(byte[]) constructor ?

In my own code I use String(byte[],String) to specify the encoding but I am using an external library that I cannot change.

String src = "with accents: é à";
byte[] bytes = src.getBytes("UTF-8");
System.out.println("UTF-8 decoded: "+new String(bytes,"UTF-8"));
System.out.println("Default decoded: "+new String(bytes));

The output for this is :

UTF-8 decoded: with accents: é à
Default decoded: with accents: é à

I have tried changing the system property file.encoding but it does not work.

Michel
  • 2,454
  • 3
  • 20
  • 28

3 Answers3

7

You need to change the locale before launching the JVM; see:

Java, bug ID 4163515

Some places seem to imply you can do this by setting the file.encoding variable when launching the JVM, such as

java -Dfile.encoding=UTF-8 ...

...but I haven't tried this myself. The safest way is to set an environment variable in the operating system.

Mat Mannion
  • 3,315
  • 2
  • 30
  • 31
  • Has anyone tried the -Dfile.encoding approach? It would be great to be able to do this in a platform-agnostic way. – Matt Passell Jan 08 '13 at 20:28
  • @MattPassell We use the following args when launching the JVM to ensure that we're specifying UTF-8 properly everywhere: -Dfile.encoding=ISO646-US -Dsun.jnu.encoding=ISO646-US and it appears to work fine. – Mat Mannion Jan 10 '13 at 14:42
  • Thanks for the response. Am I missing something? I just Googled for ISO646-US and found out it's an official name for ASCII. How does that help make sure you're using UTF-8? – Matt Passell Jan 30 '13 at 14:22
  • @MattPassell it doesn't ensure, but it makes it blatantly obvious that we're not specifying the encoding explicitly during development since the character set is so limited – Mat Mannion Feb 18 '13 at 12:49
  • thanks! For me, this solution worked by adding this JVM parameter when launching tomcat. – Neets Jun 18 '14 at 09:38
1

I think you want this: System.setProperty("file.encoding", "UTF-8");

It solved some problems, but I still have another ones. The chars "í" and "Í" doesn't convert correctly if the SO is ISO-8859-1. Just with the JVM option on startup, I get it solved. Now just my Java Console in the NetBeans IDE is crashing charset when showing special chars.

iileandro
  • 11
  • 3
1

Quoted from defaultCharset()

The default charset is determined during virtual-machine startup and typically depends upon the locale and charset of the underlying operating system.

In most OSes you can set the charset using a environment variable.

jrudolph
  • 8,307
  • 4
  • 32
  • 50
  • Not really the answer I hoped for (I would have liked to be able to do it dynamically). Giving a sample of how to change the encoding for major OSes would be great. Thanks – Michel Sep 17 '08 at 09:25