3

After updating linux and java (1.6.0.13->1.6.0.45), Java processes use different file encoding (System Property file.encoding)

New OS Version. Unfortunately I don't know the previous version anymore. But I can tell, that the update got wrong. My Collegue first updated using the x32 OS Version and then we reinstalled x64 Version.

>uname -a
Linux <hostname> 2.6.31.5-0.1-desktop #1 SMP PREEMPT 2009-10-26 15:49:03 +0100 x86_64 x86_64 x86_64 GNU/Linux

Locale Settings

>locale
LANG=en_US.ISO8859-1
LC_CTYPE=en_US.ISO8859-1
LC_NUMERIC="en_US.ISO8859-1"
LC_TIME="en_US.ISO8859-1"
LC_COLLATE="en_US.ISO8859-1"
LC_MONETARY="en_US.ISO8859-1"
LC_MESSAGES="en_US.ISO8859-1"
LC_PAPER="en_US.ISO8859-1"
LC_NAME="en_US.ISO8859-1"
LC_ADDRESS="en_US.ISO8859-1"
LC_TELEPHONE="en_US.ISO8859-1"
LC_MEASUREMENT="en_US.ISO8859-1"
LC_IDENTIFICATION="en_US.ISO8859-1"
LC_ALL=

test program

public class Test
{
  public static void main(String[] args)
  {
    System.out.println(System.getProperty("file.encoding"));
  }
}

If I start this test program it returns ANSI_X3.4-1968. On other machines with same locale settings it returns ISO8859-1. Even if i start with explicit environment variable it remains unchanged. The only working solution is to use the -Dfile.encoding option. But I don't want to adjust all scripts that use java (tomcat, maven, ant, hudson....). I want to restore the old behaviour, that the file encoding in Java programms, was retrieved from the system locale definition.

>java Test
ANSI_X3.4-1968

>LANG=de_DE.ISO8859-1 java Test
ANSI_X3.4-1968

>java -Dfile.encoding=ISO8859-1 Test
ISO8859-1

At least c programs get the correct encoding and do not use ANSI_X3.4-1968

>idn --debug  --quiet "a.de"
Charset `ISO-8859-1'.
....

Does anybody know, if there is any jvm specific setting, that might got lost during OS or java update.

Any help appreciated.

tejoe
  • 163
  • 1
  • 14
  • As a last resort there are Java config files / environment variable (`JAVA_OPTS`) which are automatically read and applied on each JVM start. If you can't hunt down and restore your original encoding, you can set it this way for all Java apps "permanently". – icza Aug 28 '14 at 12:37
  • could you be some more specific about java config files. I dont know any. JAVA_OPS does not seem to work. export JAVA_OPTS=-Dfile.encoding=ISO-8859-15 java Test ANSI_X3.4-1968. This does not work for javac compiler settings, which also uses ANSI Encoding. – tejoe Aug 28 '14 at 14:06
  • I don't know the config file locations in linux as I use windows. For `JAVA_OPTS` see for example http://stackoverflow.com/questions/2011311/running-java-with-java-opts-env-variable – icza Aug 28 '14 at 14:10

2 Answers2

5

thanks to icza. I googled a little for JAVA_OPTS, and found, that i should use JAVA_TOOL_OPTIONS instead. see How do I use the JAVA_OPTS environment variable?

or _JAVA_OPTIONS: Running java with JAVA_OPTS env variable

both are working just fine, for runtime and compiler

>export JAVA_TOOL_OPTIONS=-Dfile.encoding=ISO8859-1
>java Test
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=ISO8859-1
ISO8859-1

>javac Test.java
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=ISO8859-1

>export _JAVA_OPTIONS=-Dfile.encoding=ISO8859-1
>java Test
Picked up _JAVA_OPTIONS: -Dfile.encoding=ISO8859-1
ISO8859-1

>javac Test.java
Picked up _JAVA_OPTIONS: -Dfile.encoding=ISO8859-1
Community
  • 1
  • 1
tejoe
  • 163
  • 1
  • 14
4

Just hit something similar (on Debian). It was caused by the default LANG/LC settings being for a locale not configured in /etc/locale.gen.

To fix, I uncommented the appropriate line from /etc/locale.gen and ran sudo locale-gen.

I'm surprised that Java doesn't give any warning about this. Perl, for example, makes a loud noise to tell you something's broken:

$ LANG=pl_PL.UTF-8 perl -e ''                
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = "en_GB:en",
    LC_ALL = (unset),
    LANG = "pl_PL.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

Also, to explain some of the other behaviour: ANSI_X3.4-1968 is just an official (and somewhat opaque) way of saying "ASCII", and "ISO-8859.1" is the "usual" 8-bit superset of ASCII which is known by various names including "Western" or "Latin 1" and is the nearest thing to a "standard" character set as far as operating systems like DOS or older versions of Windows were concerned.

bjb
  • 41
  • 2