1

I am working on a Java web application which is now in the final stages of development and one of the remaining things to be done is the localization. We are using properties files for every supported locale. The issue I have spotted is that some unicode characters do no appear correctly in the web browser. The web pages have UTF-8 encoding specified in the charset meta tag and the browser has correctly detected it (In Firefox View->Character Encoding the correct one seems to be selected). I believe the issue comes from the fact that while the application displays text as UTF-8, the properties files are saved in ISO*xxxx encoding, which happens to be some eclipse default setting.

I have found a similar question here: Java properties UTF-8 encoding in Eclipse which advices me to install the Resource Bundle Plug-in. I installed and used the plug-in to edit the corresponding properties, but I still have the issue.

Is there a quick solution (I mean a solution that will not cause too much changes in the application, since it is in almost finished stage) that will overcome the problem I am experiencing.

Maybe I should mention that I am developing and observing the problem under Ubuntu Linux OS using Firefox 7. Thanks in advance.

Edit: I did not mention an important matter. My user interface is written in GWT and the properties are exposed by an interface which has annotations on the getters that GWT uses to internally create an implementation of that interface and link to the corresponding property. So I guess I do not have much control on how properties are actually being read, or at least I do not know how to do it in GWT.

Community
  • 1
  • 1
Ivaylo Slavov
  • 8,839
  • 12
  • 65
  • 108
  • 1
    Perhaps try having some of the properties echoed out to the console or a log file when they're retrieved. See what's actually being obtained from the properties file. – G_H Nov 21 '11 at 11:52
  • I have added some clarifications for my case. I am using GWT which automatically resolves the properties and it seems I've no control on how are the properties files actually read. – Ivaylo Slavov Nov 21 '11 at 12:34
  • possible duplicate of [Problem with Java properties utf8 encoding in Eclipse](http://stackoverflow.com/questions/863838/problem-with-java-properties-utf8-encoding-in-eclipse) – Raedwald Mar 28 '13 at 11:16

4 Answers4

5

Java properties files are ISO 8859-1 (Latin 1) encoded. Other characters must be represented using escaped Unicode.

So you should not enter Unicode characters outside of Latin 1 directly into your localization files. Such characters should by typed in as Unicode escapes.

If you have a bunch of properties files which are UTF-8 (or otherwise) encoded, you could translate these to Latin 1 escaped unicode using the native2ascii tool in your JDK. Also, Ant has a native2ascii task.

Other posters are pointing to ways to work around this. But personally, I'd prefer to stick to the standard encoding in properties resource files - that way it will work with everything. There is nothing to stop you authoring your files in UTF-8 encoding and transforming them to Latin 1 escaped unicode as part of your build (e.g. Ant task).

ewan.chalmers
  • 16,145
  • 43
  • 60
  • Yes, I also read that it is not a good idea of having properties files with other than the default encoding. Do you happen to know if the resource bundle plugin uses the native2ascii tool for non-unicode characters internally or I should manually escape unicode symbols? – Ivaylo Slavov Nov 21 '11 at 12:07
  • @Ivaylo: Note that this is outdated information. Since Java 6, there is a Properties.load() method that takes a Reader parameter, allowing the use of whatever encoding you like. Of course, if your properties files are loaded by some framework rather than in your own code, this feature could be useless to you. – Michael Borgwardt Nov 21 '11 at 12:16
  • @Micheal Example of another such framework - [java.util.ResourceBundle](http://download.oracle.com/javase/6/docs/api/java/util/ResourceBundle.html), and anything built on that. – ewan.chalmers Nov 21 '11 at 12:22
  • @sudocode: it's possible to choose the encoding for resource bundles; see Brett's answer. I have used that method myself. – Michael Borgwardt Nov 21 '11 at 12:32
  • Guys, I've updated my question, I did not clarify an important circumstance here - the properties are read by GWT, not by dedicated code written by our team. – Ivaylo Slavov Nov 21 '11 at 12:41
4

There are two completely separate issues here:

  • Are the properties files saved in the correct encoding? If you edit them inside eclipse, you have to set the text file encoding in the project properties. Note that this setting is saved in the .settings subdirectory of the project.
  • Is the correct encoding used to read the properties files? If you read them in your code, be sure to use an InputStreamReader to set the encoding. If they're read by some framework, you have to look through its API and configuration to see whether you can specify the encoding.

I suggest avoiding the use of unicode escapes if at all possible.

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
  • I am not reading the properties manually in the code. I create an interface that obtains them via annotations to the interface's getters - at least thats' how I've been told GWT works with properties. I haven't written that code myself and I am not much familiar with reading properties in GWT so I don't know what is the actual encoding used. I am certain of one thing though - the properties files are created using the defaults of the eclipse IDE and no manual changes to their encoding is done by anyone in the team. So, all of them use ISO 8859-1. – Ivaylo Slavov Nov 21 '11 at 12:29
  • 1
    @Ivaylo: That's the first thing you have to change then – Michael Borgwardt Nov 21 '11 at 12:30
  • 2
    @Ivaylo: Note that according to Google's own documentation, GWT actually *expects* the properties files to be in UTF-8! So that's in fact *all* you have to change. http://code.google.com/intl/de-DE/webtoolkit/doc/latest/tutorial/i18n.html – Michael Borgwardt Nov 21 '11 at 12:41
  • Thanks for the efforts! Indeed the solution was as simple as changing the file encoding. I was reluctant first, because I've read somewhere that changing the properties files encoding may cause issues, but since Google recommend using UTF-8 I guess I should dismiss that. – Ivaylo Slavov Nov 21 '11 at 13:05
2

According to this GWT localization page (which is a tutorial for using the localization by annotation method you refer to in your updated answer), it says

Encoding for international character sets

When you internationalize your application's interface, keep in mind that the languages you support may contain characters not in the ASCII character set. Therefore, both in the HTML host page (StockWatcher.html), and the Java properties files that contain the translations, you must set the encoding to UTF-8.

And in an example under the heading "Create StockWatcherConstant_de.properties", it says:

Change the encoding of the file to UTF-8.

Select the file and then from the Eclipse menu bar, select File > Properties or right-click.

Eclipse opens the Properties window.

At Text file encoding, select Other UTF-8. Apply and Save the change.

Note: Depending on your Eclipse configuration, when you apply the changes, you might get this warning: UTF-8 conflicts with the encoding defined in the content type (ISO-8859-1). Do you wish to set it anyways? You can ignore the warning and apply the change.

ewan.chalmers
  • 16,145
  • 43
  • 60
  • Thanks to you too, although I've already read the last comment of Michael Borgwardt's answer and accepted it before noticing yours. – Ivaylo Slavov Nov 21 '11 at 13:08
1

I have come across the same issue also. I overcame this by writting my own Control extending java.util.ResourceBundle.Control. The important method to override to include the relevant charset when reading resource bundles as propoerty files is newBundle( ... ) 5 parameters.

While not difficult; the method I have is long winded to do it right.

Another option is to use cal10n (http://cal10n.qos.ch/) but this might require signficant retrofitting.

Brett Walker
  • 3,566
  • 1
  • 18
  • 36