How to internationalize java source code?

Question

EDIT: I completely re-wrote the question since it seems like I was not clear enough in my first two versions. Thanks for the suggestions so far.

I would like to internationalize the source code for a tutorial project (please notice, not the runtime application). Here is an example (in Java):

/** A comment */
public String doSomething() {
  System.out.println("Something was done successfully");
}

in English , and then have the French version be something like:

/** Un commentaire */
public String faitQuelqueChose() {
  System.out.println("Quelque chose a été fait avec succès.");
}

and so on. And then have something like a properties file somewhere to edit these translations with usual tools, such as:

com.foo.class.comment1=A comment
com.foo.class.method1=doSomething
com.foo.class.string1=Something was done successfully

and for other languages:

com.foo.class.comment1=Un commentaire
com.foo.class.method1=faitQuelqueChose
com.foo.class.string1=Quelque chose a été fait avec succès.

I am trying to find the easiest, most efficient and unobtrusive way to do this with the least amount of manual grunt work (other than obviously translating the actual text). Preferably working under Eclipse. For example, the original code would be written in English, then externalized (to properties, preferably leaving the original source untouched), translated (humanly) and then re-generated (as a separate source file / project).

Some trails I have found (other than what AlexS suggested):

AntLR, a language parser / generator. There seems to be a supporting Eclipse plugin
Using Eclipse's AST (Abstract Syntax Tree) and I guess building some kind of plugin.

I am just surprised there isn't a tool out there that does this already.

refer this http://stackoverflow.com/questions/1318347/how-to-use-java-property-files — Umesh Aawte, Jun 19 '12 at 15:13
The [Java Internationalization Tutorial](http://docs.oracle.com/javase/tutorial/i18n/index.html) is good start. — Edwin Dalorzo, Jun 19 '12 at 15:15
Developers MUST know English. The whole standard Java API is in English, as are 99.9% of the external libraries they will use. If they don't know English, they'd better learn it ASAP. I wouldn't bother translating Java code in a tutorial aimed at developers. — JB Nizet, Jun 19 '12 at 15:16
Do not learn them bad habits. Every programmer must can read English code — Codium, Jun 19 '12 at 15:40
I agree that developers have to know english, but for tutorials it is better to use the native language of the students, so that you are only dealing with functional understanding problems and not with english problems, since for second ones there are much better trained english teachers... And after all there is a reason for (especially lowlevel-)programming books being released in many languages and I don't mean Java, c or c++ ;) — AlexS, Jun 19 '12 at 16:10
@AlexS: Being Polish guy, that I am and having read many Polish language lectures, tutorials and text books, I must confess that I pretty much _hate_ translated variable, class and method names. It looks very odd with English keywords. Besides there is no way to stick to valid grammar rules, what makes it even more painful. I would really appreciate people leaving code untouched (you can always describe it in details, what you should do anyway). — Paweł Dyda, Jun 19 '12 at 17:18
Please don't turn this into a futile philosophical debate. If I want to have source code translated in different languages, this is my right and this is what I'm asking how to do efficiently. Thank you. — deryb, Jun 19 '12 at 17:39

AlexS · Accepted Answer · 2012-06-19T15:41:32.817

2

I'd use unique strings as methodnames (or anything you want to be replaced by localized versions.

public String m37hod_1() {
  System.out.println(m355a6e_1);
}

then I'd define a propertyfile for each language like this:

m37hod_1=doSomething
m355a6e_1="Something was done successfully"

And then I'd write a small program parsing the sourcefiles and replacing the strings. So everything just outside eclipse.

Or I'd use the ant task Replace and propertyfiles as well, instead of a standalone translation program. Something like that:

<replace 
    file="${src}/*.*"
    value="defaultvalue"
    propertyFile="${language}.properties">
  <replacefilter 
    token="m37hod_1" 
    property="m37hod_1"/>
  <replacefilter 
    token="m355a6e_1" 
    property="m355a6e_1"/>
</replace>

Using one of these methods you won't have to explain anything about localization in your tutorials (except you want to), but can concentrate on your real topic.

edited Jun 19 '12 at 15:41

answered Jun 19 '12 at 15:28

AlexS

5,295
3
38
54

+1 for answering the actual question. Note: Meaningful IDs may be more understandable. – Andy Thomas Jun 19 '12 at 18:16
Thanks for your suggestions. Yea this is the only way I know of for now (I thought of perhaps using some kind of grep / shell search & replace tool) but then it makes the original code quite harder to read. I would really like to find something which is more user friendly and plugged into Eclipse if possible. – deryb Jun 19 '12 at 18:23
Probably you can actually leave your names what they are and it would work the same. I mainly use 1337 for the geek-factor ;) – AlexS Jun 20 '12 at 05:04
@zeartist I don't know how ant integrates into eclipse, but if you use ant I think you don't have to do more than specifying the language in your build-properties and hit compile. Of course you would have to make a source.jar as well. Maybe tomorrow night I'll have some time to dig into it more deeply... – AlexS Jun 20 '12 at 05:08
@AlexS - In Eclipse you can right-click on an ant buildfile and choose *Run As>Ant Build...*. – Andy Thomas Jun 20 '12 at 13:28
Thanks, I've decided to go this way, since it is simple, and as outlined by "Ira Baxter" below, using AntLR, ASTs or similar tools would be complex. I will mark each of my identifiers with specific text (ex. CC_className_CC, MM_methodName_MM, VV_variable_VV, etc) and then have a tool search-replace each instance. If things don't compile or go wrong I can always edit and fix the code or the translations, hence it seems like the simplest and most straightforward approach. I can still use ResourceBundle too with this in the translator program. And/or perhaps even write a custom ant task... – deryb Jun 26 '12 at 16:32

score 2 · Answer 2 · edited May 23 '17 at 10:09

What you want is a massive code change engine.

ANTLR won't do the trick; ASTs are necessary but not sufficient. See my essay on Life After Parsing. Eclipse's "AST" may be better, if the Eclipse package provides some support for name and type resolution; otherwise you'll never be able to figure out how to replace each "doSomething" (might be overloaded or local), unless you are willing to replace them all identically (and you likely can't do that, because some symbols refer to Java library elements).

Our DMS Software Reengineering Toolkit could be used to accomplish your task. DMS can parse Java to ASTs (including comment capture), traverse the ASTs in arbitrary ways, analyze/change ASTs, and the export modified ASTs as valid source code (including the comments).

Basically you want to enumerate all comments, strings, and declarations of identifiers, export them to an external "database" to be mapped (manually? by Google Translate?) to an equivalent. In each case you want to note not only the item of interest, but its precise location (source file, line, even column) because items that are spelled identically in the original text may need different spellings in the modified text.

Enumeration of strings is pretty easy if you have the AST; simply crawl the tree and look for tree nodes containing string literals. (ANTLR and Eclipse can surely do this, too).

Enumeration of comments is also straightforward if the parser you have captures comments. DMS does. I'm not quite sure if ANTLR's Java grammar does, or the Eclipse AST engine; I suspect they are both capable.

Enumeration of declarations (classes, methods, fields, locals) is relatively straightforward; there's rather more cases to worry about (e.g., anonymous classes containing extensions to base classes). You can code a procedure to walk the AST and match the tree structures, but here's the place that DMS starts to make a difference: you can write surface-syntax patterns that look like the source code you want to match. For instance:

   pattern local_for_loop_index(i: IDENTIFIER, t: type, e: expression, e2: expression, e3:expression): for_loop_header
         = "for (\t \i = \e,\e2,\e3)"

will match declarations of local for loop variables, and return subtrees for the IDENTIFIER, the type, and the various expressions; you'd want to capture just the identifier (and its location, easily done by taking if from the source position information that DMS stamps on every tree node). You'd probably need 10-20 such patterns to cover the cases of all the different kinds of identifiers.

Capture step completed, something needs to translate all the captured entities to your target language. I'll leave that to you; what's left is to put the translated entities back.

The key to this is the precise source location. A line number isn't good enough in practice; you may have several translated entities in the same line, in the worst case, some with different scopes (imagine nested for loops for example). The replacement process for comments, strings and the declarations are straightforward; rescan the tree for nodes that match any of the identified locations, and replace the entity found there with its translation. (You can do this with DMS and ANTLR. I think Eclipse ADT requires you generate a "patch" but I guess that would work.).

The fun part comes in replacing the identifier uses. For this, you need to know two things:

for any use of an identifier, what is the declaration is uses; if you know this, you can replace it with the new name for the declaration; DMS provides full name and type resolution as well as a usage list, making this pretty easy, and
Do renamed identifiers shadow one another in scopes differently than the originals? This is harder to do in general. However, for the Java language, we have a "shadowing" check, so you can at least decide after renaming that you have an issues. (There's even a renaming procedure that can be used to resolve such shadowing conflicts

After patching the trees, you simply rewrite the patched tree back out as a source file using DMS's built-in prettyprinter. I think Eclipse AST can write out its tree plus patches. I'm not sure ANTLR provides any facilities for regenerating source code from ASTs, although somebody may have coded one for the Java grammar. This is harder to do than it sounds, because of all the picky detail. YMMV.

Given your goal, I'm a little surprised that you don't want a sourcefile "foo.java" containing "class foo { ... }" to get renamed to .java. This would require not only writing the transformed tree to the translated file name (pretty easy) but perhaps even reconstructing the directory tree (DMS provides facilities for doing directory construction and file copies, too).

If you want to do this for many languages, you'd need to run the process once per language. If you wanted to do this just for strings (the classic internationalization case), you'd replace each string (that needs changing, not all of them do) by a call on a resource access with a unique resource id; a runtime table would hold the various strings.

Thanks, you bring up a lot of useful bits. True that the solution isn't just about parsing but transforming. I think I will go for a simple find-replace solution then, since it sounds like I might get in over my head with these parser-translators and my use case was for rather simple stuff. I will post my official answer below. — deryb, Jun 26 '12 at 16:18

score 1 · Answer 3 · answered Jun 19 '12 at 19:26

1

One approach would be to finish the code in one language, then translate to others.

You could use Eclipse to help you.

Copy the finished code to language-specific projects.
Then:
- Identifiers: In the Outline view (Window>Show View>Outline), select each item and Refactor>Rename (Alt+Shift+R). This takes care of renaming the identifier wherever it's used.
- Comments: Use Search>File to find all instances of "/*" or "//". Click on each and modify.
- Strings:
  1. Use Source>Externalize strings to find all of the literal strings.
  2. Search>File for "Messages.getString()".
  3. Click on each result and modify.
  4. On each file, ''Edit>Find/Replace'', replacing "//\$NON-NLS-.*\$" with empty string.

answered Jun 19 '12 at 19:26

Andy Thomas

84,978
11
107
151

Thanks, yea that is one way. I'd like to find a more automated & elegant way of doing this. Ex. collecting all the strings to externalize, having a tool generate a big properties file containing all the classes, methods, variables, comments and strings with some kind of qualified identifier (example com.foo.class1.method1=myMethod, etc), ready to be translated. – deryb Jun 19 '12 at 19:42
Understood. Consider this if such a tool does not exist. :) Provided this answer based on my own experience with localization, and the guess that example code for tutorials is not likely to be large-scale. – Andy Thomas Jun 19 '12 at 19:57

score 0 · Answer 4 · edited May 23 '17 at 12:18

Use .properties file, like:

Locale locale = new Locale(language, country);
ResourceBundle  captions= ResourceBundle.getBundle("Messages",locale);

This way, Java picks the Messages.properties file according to the current local (which is acquired from the operating system or Java locale settings)

The file should be on the classpath, called Messages.properties (the default one), or Messages_de.properties for German, etc.

See this for a complete tutorial: http://docs.oracle.com/javase/tutorial/i18n/intro/steps.html

As far as the source code goes, I'd strongly recommend staying with English. Method names like getUnternehmen() are worse to the average developer then plain English ones. If you need to familiarize foreign developers to your code, write a proper developer documentation in their language.

If you'd like to have Javadoc in both English and other languages, see this SO thread.

This does not answer my question, sorry. Re-read again if needed. — deryb, Jun 19 '12 at 18:03

score 0 · Answer 5 · answered Jun 19 '12 at 15:17

0

for the printed/logged string, java possess some internatization functionnalities, aka ResourceBundle. There is a tutorial about this on oracle site

Eclipse also possess a funtionnality for this ("Externalize String", as i recall).

for the function name, i don't think there anything out, since this will require you to maintain the code source on many version...

regards

answered Jun 19 '12 at 15:17

PATRY Guillaume

4,287
1
32
41

Thanks but that does not answer the question. – deryb Jun 19 '12 at 18:48

score 0 · Answer 6 · answered Dec 16 '13 at 14:42

You could write your code using freemarker templates (or another templating language such as velocity).

doSomething.tml

/** ${lang['doSomething.comment']} */
public String ${lang['doSomething.methodName']}() {
    System.out.println("${lang['doSomething.message']}");
}

lang_en.prop

doSomething.comment=A comment
doSomething.methodName=doSomething
doSomething.message=Something was done successfully

And then merge the template with each language prop file during your build (using Ant / Gradle / Maven etc.)

How to internationalize java source code?

6 Answers6