296

I need to use UTF-8 in my resource properties using Java's ResourceBundle. When I enter the text directly into the properties file, it displays as mojibake.

My app runs on Google App Engine.

Can anyone give me an example? I can't get this work.

Mechanical snail
  • 29,755
  • 14
  • 88
  • 113
nacho
  • 2,961
  • 3
  • 15
  • 3
  • 1
    Java 1.6 Fixed this as you can pass in a Reader. See the @Chinaxing answer way down below – Will Feb 03 '14 at 21:45
  • 1
    @Will: question is primarily about reading them via `java.util.ResourceBundle`, not `java.util.Properties`. – BalusC Sep 11 '14 at 06:17
  • 1
    Check this answered question,,, hope it helps you [http://stackoverflow.com/questions/863838/problem-with-java-properties-utf8-encoding-in-eclipse][1] [1]: http://stackoverflow.com/questions/863838/problem-with-java-properties-utf8-encoding-in-eclipse – Majdy the programmer Bboy Mar 31 '15 at 04:18
  • 7
    JDK9 should support UTF-8 natively, see [JEP 226](http://openjdk.java.net/jeps/226) – Paolo Fulgoni May 08 '15 at 14:41

17 Answers17

418

Java 9 and newer

From Java 9 onwards property files are encoded as UTF-8 by default, and using characters outside of ISO-8859-1 should work out of the box.

Java 8 and older

The ResourceBundle#getBundle() uses under the covers PropertyResourceBundle when a .properties file is specified. This in turn uses by default Properties#load(InputStream) to load those properties files. As per the javadoc, they are by default read as ISO-8859-1.

public void load(InputStream inStream) throws IOException

Reads a property list (key and element pairs) from the input byte stream. The input stream is in a simple line-oriented format as specified in load(Reader) and is assumed to use the ISO 8859-1 character encoding; that is each byte is one Latin1 character. Characters not in Latin1, and certain special characters, are represented in keys and elements using Unicode escapes as defined in section 3.3 of The Java™ Language Specification.

So, you'd need to save them as ISO-8859-1. If you have any characters beyond ISO-8859-1 range and you can't use \uXXXX off top of head and you're thus forced to save the file as UTF-8, then you'd need to use the native2ascii tool to convert an UTF-8 saved properties file to an ISO-8859-1 saved properties file wherein all uncovered characters are converted into \uXXXX format. The below example converts a UTF-8 encoded properties file text_utf8.properties to a valid ISO-8859-1 encoded properties file text.properties.

native2ascii -encoding UTF-8 text_utf8.properties text.properties

When using a sane IDE such as Eclipse, this is already automatically done when you create a .properties file in a Java based project and use Eclipse's own editor. Eclipse will transparently convert the characters beyond ISO-8859-1 range to \uXXXX format. See also below screenshots (note the "Properties" and "Source" tabs on bottom, click for large):

"Properties" tab "Source" tab

Alternatively, you could also create a custom ResourceBundle.Control implementation wherein you explicitly read the properties files as UTF-8 using InputStreamReader, so that you can just save them as UTF-8 without the need to hassle with native2ascii. Here's a kickoff example:

public class UTF8Control extends Control {
    public ResourceBundle newBundle
        (String baseName, Locale locale, String format, ClassLoader loader, boolean reload)
            throws IllegalAccessException, InstantiationException, IOException
    {
        // The below is a copy of the default implementation.
        String bundleName = toBundleName(baseName, locale);
        String resourceName = toResourceName(bundleName, "properties");
        ResourceBundle bundle = null;
        InputStream stream = null;
        if (reload) {
            URL url = loader.getResource(resourceName);
            if (url != null) {
                URLConnection connection = url.openConnection();
                if (connection != null) {
                    connection.setUseCaches(false);
                    stream = connection.getInputStream();
                }
            }
        } else {
            stream = loader.getResourceAsStream(resourceName);
        }
        if (stream != null) {
            try {
                // Only this line is changed to make it to read properties files as UTF-8.
                bundle = new PropertyResourceBundle(new InputStreamReader(stream, "UTF-8"));
            } finally {
                stream.close();
            }
        }
        return bundle;
    }
}

This can be used as follows:

ResourceBundle bundle = ResourceBundle.getBundle("com.example.i18n.text", new UTF8Control());

See also:

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • Thanks. BTW it seems to be a good idea to override getFormats to return FORMAT_PROPERTIES. – Flávio Etrusco Dec 10 '13 at 12:37
  • Could you elaborate on this suggestion to override getFormats()? – Mark Roper Jan 16 '14 at 13:13
  • I prepare newer implementation for Java 8. Let see here: https://gist.github.com/Hasacz89/d93955ec91afc73a06e3 – Michał Rowicki Oct 08 '14 at 09:17
  • 13
    Do not hesitate to use `StandardCharsets.UTF_8` if your're using Java 7+ – Niks Jun 22 '15 at 07:00
  • That Control override is a very nice and simple solution, but wouldn't it be more logical to make the encoding a constructor arg, rather than hardcoding it on utf8? – Nyerguds Apr 14 '16 at 10:48
  • 1
    @Nyerguds: if you see reasons to ever programmatically change it (I can't for life imagine one though), feel free to do so. All code snippets I post are just kickoff examples after all. – BalusC Apr 14 '16 at 10:50
  • This might help future readers. I had to override `getFallbackLocale()` to return `null`, otherwise the passed locale to `ResourceBundle.getBundle()` is ignored and the default locale is always used. – Eng.Fouad Jun 28 '16 at 14:48
  • At the risk of being pedantic - there are hundreds if not thousands of accepted languages for which computer systems should accept. Considering the discussion was about resource bundles (which inherently accept an ISO639 language code as part of the API to fetch the right bundle in context)... the first part of the answer recommending non UTF-8 character sets (that do not support internationalization) would seem to be misguided whereas the latter part with example is probably the only relevant part of a correct answer. – Darrell Teague Aug 03 '16 at 20:09
  • Anybody thinking about employing this may want to look at a more recent version of the base class. There's a privilege check in there to get at the resource stream that isn't in this version. (Otherwise it's still relevant because the actual difference is that one line that creates the resource bundle.) – Pointy Oct 26 '17 at 14:45
  • This worked for me. I included this UTF8Control class in my package and use ResourceBundle mybundle = ResourceBundle.getBundle("myfile", new UTF8Control()); and it worked. Thanks. – Shashank Jan 15 '18 at 09:32
  • 1
    I think the `newBundle` method should start with `if(!format.equals("java.properties")) return super.newBundle(…);`, to leave other bundle formats (like locating and loading a subclass of `ResourceBundle`) intact. – Holger Mar 11 '19 at 10:41
  • I used to rely on a UTF8 ResourceBundle.Control but Java9 broke that. If you are using Modules (and you probably should be) read the docs for ResourceBundleProvider before wasting time extending Control. – Dave May 25 '21 at 12:34
143

Given that you have an instance of ResourceBundle and you can get String by:

String val = bundle.getString(key); 

I solved my Japanese display problem by:

return new String(val.getBytes("ISO-8859-1"), "UTF-8");
Regexident
  • 29,441
  • 10
  • 93
  • 100
Rod
  • 2,180
  • 2
  • 20
  • 23
  • 43
    To all naive upvoters/commenters here: this is not a solution, but a workaround. The true underlying problem still stands and needs solving. – BalusC Dec 05 '14 at 14:14
  • 2
    This fixed my situation. The solution would be for Java to start handling UTF-8 natively in resource bundles and in properties files. Until that happens I'll use a workaround. – JohnRDOrazio Aug 21 '15 at 23:14
  • 2
    @BalusC; what is the disadvantage of this approach? (other than creating an extra String?) – Paaske Nov 17 '15 at 09:51
  • 11
    @Paaske: it's a workaround, not a solution. You'd need to reapply the workaround over all place on all string variables throughout the code base. This is pure nonsense. Just fix it in a single place, at the right place so that the string variables immediately contain the right value. There should be absolutely no need to modify the client. – BalusC Nov 17 '15 at 11:06
  • 5
    Yes, if you have to modify the entire application, of course this is bad. But if you're already using the ResourceBundle as a singleton you only have to fix it once. I was under the impression that the singleton approach was most common way of using the ResourceBundle. – Paaske Nov 18 '15 at 07:28
  • Love this approach. I've been struggling with Java's lack of UTF-8 in resource bundles for years, and am amazed at this simple work-around. – Steve McLeod May 24 '18 at 14:33
57

look at this : http://docs.oracle.com/javase/6/docs/api/java/util/Properties.html#load(java.io.Reader)

the properties accept an Reader object as arguments, which you can create from an InputStream.

at the create time, you can specify the encoding of the Reader:

InputStreamReader isr = new InputStreamReader(stream, "UTF-8");

then apply this Reader to the load method :

prop.load(isr);

BTW: get the stream from .properties file :

 InputStream stream = this.class.getClassLoader().getResourceAsStream("a.properties");

BTW: get resource bundle from InputStreamReader:

ResourceBundle rb = new PropertyResourceBundle(isr);

hope this can help you !

dedek
  • 7,981
  • 3
  • 38
  • 68
Chinaxing
  • 8,054
  • 4
  • 28
  • 36
  • 5
    The actual question here is about `ResourceBundle`, though. – Nyerguds Apr 14 '16 at 10:52
  • 2
    True, this should be accepted answer if you are using `Properties` and you would like to retrieve `UTF-8` String then this works like a charm. However for a `ResourceBundle` such as language resources then the accepted answer is elegant. Nevertheless up voted the answer. – Ilgıt Yıldırım May 17 '16 at 08:59
  • `ResourceBundle rb = new PropertyResourceBundle(new InputStreamReader(stream, "UTF-8"))` – dedek Apr 04 '19 at 15:03
  • But how to get stream for specified language? `a.properties` is filename, while bundle name is `a`. – Mikhail Ionkin Jul 27 '20 at 14:12
34

This problem has finally been fixed in Java 9: https://docs.oracle.com/javase/9/intl/internationalization-enhancements-jdk-9

Default encoding for properties files is now UTF-8.

Most existing properties files should not be affected: UTF-8 and ISO-8859-1 have the same encoding for ASCII characters, and human-readable non-ASCII ISO-8859-1 encoding is not valid UTF-8. If an invalid UTF-8 byte sequence is detected, the Java runtime automatically rereads the file in ISO-8859-1.

stenix
  • 3,068
  • 2
  • 19
  • 30
26

ResourceBundle.Control with UTF-8 and new String methods don't work, if the properties file uses cp1251 charset, for example.

So I recomended using a common method: write in unicode symbols. For this:

IDEA -- has a special "Transparent native-to-ASCII conversion" option (Settings > File Encoding).

Eclipse -- has a plugin "Properties Editor". It can work as separate application.

Kariem
  • 4,398
  • 3
  • 44
  • 73
Kinjeiro
  • 917
  • 11
  • 10
  • 4
    In IntelliJ IDEA 14, this is located in Settings -> Editor -> File Encodings. I also had to delete any existing properties files, and re-create them for this option to take effect. – Cypher May 22 '15 at 18:52
  • IDE's are not particularly relevant to the answer but just tools that really don't address the underlying problem of not storing content in the UTF-8 character-set .... which would solve the problem straight away without conversion or hackery like writing properties in unicode symbols inside a file defined with a different character set. – Darrell Teague Aug 03 '16 at 20:15
20
package com.varaneckas.utils;  

import java.io.UnsupportedEncodingException;  
import java.util.Enumeration;  
import java.util.PropertyResourceBundle;  
import java.util.ResourceBundle;  

/** 
 * UTF-8 friendly ResourceBundle support 
 *  
 * Utility that allows having multi-byte characters inside java .property files. 
 * It removes the need for Sun's native2ascii application, you can simply have 
 * UTF-8 encoded editable .property files. 
 *  
 * Use:  
 * ResourceBundle bundle = Utf8ResourceBundle.getBundle("bundle_name"); 
 *  
 * @author Tomas Varaneckas <tomas.varaneckas@gmail.com> 
 */  
public abstract class Utf8ResourceBundle {  

    /** 
     * Gets the unicode friendly resource bundle 
     *  
     * @param baseName 
     * @see ResourceBundle#getBundle(String) 
     * @return Unicode friendly resource bundle 
     */  
    public static final ResourceBundle getBundle(final String baseName) {  
        return createUtf8PropertyResourceBundle(  
                ResourceBundle.getBundle(baseName));  
    }  

    /** 
     * Creates unicode friendly {@link PropertyResourceBundle} if possible. 
     *  
     * @param bundle  
     * @return Unicode friendly property resource bundle 
     */  
    private static ResourceBundle createUtf8PropertyResourceBundle(  
            final ResourceBundle bundle) {  
        if (!(bundle instanceof PropertyResourceBundle)) {  
            return bundle;  
        }  
        return new Utf8PropertyResourceBundle((PropertyResourceBundle) bundle);  
    }  

    /** 
     * Resource Bundle that does the hard work 
     */  
    private static class Utf8PropertyResourceBundle extends ResourceBundle {  

        /** 
         * Bundle with unicode data 
         */  
        private final PropertyResourceBundle bundle;  

        /** 
         * Initializing constructor 
         *  
         * @param bundle 
         */  
        private Utf8PropertyResourceBundle(final PropertyResourceBundle bundle) {  
            this.bundle = bundle;  
        }  

        @Override  
        @SuppressWarnings("unchecked")  
        public Enumeration getKeys() {  
            return bundle.getKeys();  
        }  

        @Override  
        protected Object handleGetObject(final String key) {  
            final String value = bundle.getString(key);  
            if (value == null)  
                return null;  
            try {  
                return new String(value.getBytes("ISO-8859-1"), "UTF-8");  
            } catch (final UnsupportedEncodingException e) {  
                throw new RuntimeException("Encoding not supported", e);  
            }  
        }  
    }  
}  
marcolopes
  • 9,232
  • 14
  • 54
  • 65
  • 1
    I like this solution and I post it like Gist https://gist.github.com/enginer/3168dd4a374994718f0e – Enginer Sep 12 '14 at 07:44
  • This works very well. Just added a Chinese Translation properties file in UTF8 and it loads up without any issues. – tresf Feb 06 '18 at 07:26
19

We create a resources.utf8 file that contains the resources in UTF-8 and have a rule to run the following:

native2ascii -encoding utf8 resources.utf8 resources.properties
andykellr
  • 789
  • 6
  • 9
10

Attention: In Java <= 8 java property files should be encoded in ISO 8859-1!

ISO 8859-1 character encoding. Characters that cannot be directly represented in this encoding can be written using Unicode escapes ; only a single 'u' character is allowed in an escape sequence.

@see Properties Java Doc

If you still really want to do this: have a look at: Java properties UTF-8 encoding in Eclipse -- there are some code samples


Since Java 9: property files are encoded in UTF-8, so there should be no problem/doubt

In Java SE 9, properties files are loaded in UTF-8 encoding. In previous releases, ISO-8859-1 encoding was used for loading property resource bundles.

(https://docs.oracle.com/javase/9/intl/internationalization-enhancements-jdk-9.htm#JSINT-GUID-9DCDB41C-A989-4220-8140-DBFB844A0FCA)

Ralph
  • 118,862
  • 56
  • 287
  • 383
  • 1
    Java != Eclipse... the latter is an IDE. Further data != Java. Java supports stream processing using a vast array of character sets, which for internationalization (question is about ResourceBundles after all) ... resolves to using UTF-8 as the most straight-forward answer. Writing property files in a character set not supported by the target language unnecessarily complicates the problem. – Darrell Teague Aug 03 '16 at 20:17
  • @Darell Teague: The "hint" that a propertie file loaded for an ResouceBundle have to been is ISO 8859-1 is a java statement: http://docs.oracle.com/javase/8/docs/api/java/util/Properties.html#load-java.io.InputStream- ... The second part of my answer is just a "hint" how to deal with hat problem. – Ralph Aug 04 '16 at 11:44
  • Since java9 properties are in UTF-8 – pdem Jan 28 '22 at 15:04
  • @pdem thanks for the hint to this old answer - added a section to clarify that my hint was only for Java <= 8 – Ralph Jan 29 '22 at 12:49
5

http://sourceforge.net/projects/eclipse-rbe/

as already stated property files should be encoded in ISO 8859-1

You can use the above plugin for eclipse IDE to make the Unicode conversion for you.

fmucar
  • 14,361
  • 2
  • 45
  • 50
3

Here's a Java 7 solution that uses Guava's excellent support library and the try-with-resources construct. It reads and writes properties files using UTF-8 for the simplest overall experience.

To read a properties file as UTF-8:

File file =  new File("/path/to/example.properties");

// Create an empty set of properties
Properties properties = new Properties();

if (file.exists()) {

  // Use a UTF-8 reader from Guava
  try (Reader reader = Files.newReader(file, Charsets.UTF_8)) {
    properties.load(reader);
  } catch (IOException e) {
    // Do something
  }
}

To write a properties file as UTF-8:

File file =  new File("/path/to/example.properties");

// Use a UTF-8 writer from Guava
try (Writer writer = Files.newWriter(file, Charsets.UTF_8)) {
  properties.store(writer, "Your title here");
  writer.flush();
} catch (IOException e) {
  // Do something
}
Gary
  • 7,167
  • 3
  • 38
  • 57
  • This answer is useful. The core problem here with various answers seems to be a misunderstanding about data and character sets. Java can read any data (correctly) by simply specifying the character set it was stored in as shown above. UTF-8 is commonly used to support most if not every language on the planet and therefore is very much applicable to ResourceBundle based properties. – Darrell Teague Aug 03 '16 at 20:20
  • @DarrellTeague: Well, "UTF-8 is commonly used to support..." - there should rather be "**Unicode** is commonly used to support..." :) as UTF-8 is just a character encoding of the Unicode (https://en.wikipedia.org/wiki/UTF-8). – Honza Zidek Mar 06 '17 at 17:08
  • Actually UTF-8 was meant to be specifically called out as "the character set" (versus just referencing 'any UniCode character set') as UTF-8 in this context (data) has predominate usage on the Internet by some measures as high as 67%. Ref: http://stackoverflow.com/questions/8509339/what-is-the-most-common-encoding-of-each-language – Darrell Teague Mar 08 '17 at 01:42
3

As one suggested, i went through implementation of resource bundle.. but that did not help.. as the bundle was always called under en_US locale... i tried to set my default locale to a different language and still my implementation of resource bundle control was being called with en_US... i tried to put log messages and do a step through debug and see if a different local call was being made after i change locale at run time through xhtml and JSF calls... that did not happend... then i tried to do a system set default to a utf8 for reading files by my server (tomcat server).. but that caused pronlem as all my class libraries were not compiled under utf8 and tomcat started to read then in utf8 format and server was not running properly... then i ended up with implementing a method in my java controller to be called from xhtml files.. in that method i did the following:

        public String message(String key, boolean toUTF8) throws Throwable{
            String result = "";
            try{
                FacesContext context = FacesContext.getCurrentInstance();
                String message = context.getApplication().getResourceBundle(context, "messages").getString(key);

                result = message==null ? "" : toUTF8 ? new String(message.getBytes("iso8859-1"), "utf-8") : message;
            }catch(Throwable t){}
            return result;
        }

I was particularly nervous as this could slow down performance of my application... however, after implementing this, it looks like as if my application is faster now.. i think it is because, i am now directly accessing the properties instead of letting JSF parse its way into accessing properties... i specifically pass Boolean argument in this call because i know some of the properties would not be translated and do not need to be in utf8 format...

Now I have saved my properties file in UTF8 format and it is working fine as each user in my application has a referent locale preference.

Masoud
  • 63
  • 8
2

For what it's worth my issue was that the files themselves were in the wrong encoding. Using iconv worked for me

iconv -f ISO-8859-15 -t UTF-8  messages_nl.properties > messages_nl.properties.new
Zack Bartel
  • 3,703
  • 1
  • 18
  • 11
  • +1 for mentioning `iconv`. I've never heard of it before but I typed it into the console and lo and behold, it's a thing that exists (in CentOS 6, anyways.) – ArtOfWarfare Jan 26 '15 at 16:55
  • Now that I've actually tried using it though, it didn't work: it threw up on the first character that couldn't be converted to ISO-8559-1. – ArtOfWarfare Jan 26 '15 at 18:36
2

I tried to use the approach provided by Rod, but taking into consideration BalusC concern about not repeating the same work-around in all the application and came with this class:

import java.io.UnsupportedEncodingException;
import java.util.Locale;
import java.util.ResourceBundle;

public class MyResourceBundle {

    // feature variables
    private ResourceBundle bundle;
    private String fileEncoding;

    public MyResourceBundle(Locale locale, String fileEncoding){
        this.bundle = ResourceBundle.getBundle("com.app.Bundle", locale);
        this.fileEncoding = fileEncoding;
    }

    public MyResourceBundle(Locale locale){
        this(locale, "UTF-8");
    }

    public String getString(String key){
        String value = bundle.getString(key); 
        try {
            return new String(value.getBytes("ISO-8859-1"), fileEncoding);
        } catch (UnsupportedEncodingException e) {
            return value;
        }
    }
}

The way to use this would be very similar than the regular ResourceBundle usage:

private MyResourceBundle labels = new MyResourceBundle("es", "UTF-8");
String label = labels.getString(key)

Or you can use the alternate constructor which uses UTF-8 by default:

private MyResourceBundle labels = new MyResourceBundle("es");
carlossierra
  • 4,479
  • 4
  • 19
  • 30
2
Properties prop = new Properties();
String fileName = "./src/test/resources/predefined.properties";
FileInputStream inputStream = new FileInputStream(fileName);
InputStreamReader reader = new InputStreamReader(inputStream,"UTF-8");
2

Open the Settings / Preferences dialog (Ctrl + Alt + S), then click Editor and File Encodings.

Screenshot of window shown

Then, on the bottom, you will fing default encodings for properties files. Choose your encoding type.

Alternatively you can use unicode symbols instead of text in your resource bundle (for example "ів" equals \u0456\u0432)

Neuron
  • 5,141
  • 5
  • 38
  • 59
1

From Java 9, the default to load properties file has been changed to UTF-8. https://docs.oracle.com/javase/9/intl/internationalization-enhancements-jdk-9.htm

Fran García
  • 2,011
  • 16
  • 24
1

Speaking for current (2021-2) Java versions there is still the old ISO-8859-1 function utils.Properties#load.

Allow me to quote from the official doc.

PropertyResourceBundle

PropertyResourceBundle can be constructed either from an InputStream or a Reader, which represents a property file. Constructing a PropertyResourceBundle instance from an InputStream requires that the input stream be encoded in UTF-8. By default, if a MalformedInputException or an UnmappableCharacterException occurs on reading the input stream, then the PropertyResourceBundle instance resets to the state before the exception, re-reads the input stream in ISO-8859-1, and continues reading. If the system property java.util.PropertyResourceBundle.encoding is set to either "ISO-8859-1" or "UTF-8", the input stream is solely read in that encoding, and throws the exception if it encounters an invalid sequence. If "ISO-8859-1" is specified, characters that cannot be represented in ISO-8859-1 encoding must be represented by Unicode Escapes as defined in section 3.3 of The Java™ Language Specification whereas the other constructor which takes a Reader does not have that limitation. Other encoding values are ignored for this system property. The system property is read and evaluated when initializing this class. Changing or removing the property has no effect after the initialization.

https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/PropertyResourceBundle.html

Properties#load

Reads a property list (key and element pairs) from the input byte stream. The input stream is in a simple line-oriented format as specified in load(Reader) and is assumed to use the ISO 8859-1 character encoding; that is each byte is one Latin1 character. Characters not in Latin1, and certain special characters, are represented in keys and elements using Unicode escapes as defined in section 3.3 of The Java™ Language Specification.

https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/Properties.html#load(java.io.InputStream)

jschnasse
  • 8,526
  • 6
  • 32
  • 72