3

According to the JLS, it is possible to "mangle" package names containing non-ASCII characters in case host filesystem doesn't support Unicode. For instance, package é becomes @00e9, and papierMâché becomes papierM@00e2ch@00e9 when projected to the file system.

The question is: is it ever possible to achieve just the same for Java source files (whose names must confirm to the corresponding names of Java classes)?

The background of the problem is I need to have an accented e with acute in my public class name ('é', '\u00e9'). Yes I know I shouldn't, and Unicode in file names is a malpractice, but still I need it.

However, either Mac OS X or the underlying HFS+ filesystem disallows this very character in file names, replacing it with 'e' immediately followed by COMBINING ACUTE ACCENT ("e\u0301"). This behaviour is totally different from NTFS or ext3/ext4, where two files named "\u00e9" and "e\u0301" can co-exist in the same directory (test repository is here).

The above HFS+ behaviour results in 2 problems:

  1. I'm unable to compile my classes with javac because class name and file name are not the same (though I am able to compile them with either Maven or ecj).
  2. I can't have my classes managed with Git, as it always reports that the file has been renamed:

.

$ git status .
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#   "src/main/java/com/intersystems/persistence/Cache\314\201ExtremeConnectionParameters.java"
#   "src/main/java/com/intersystems/persistence/Cache\314\201ExtremePersister.java"
#   "src/main/java/com/intersystems/persistence/Cache\314\201JdbcConnectionParameters.java"
#   "src/main/java/com/intersystems/persistence/Cache\314\201JdbcPersister.java"
#   "src/main/java/com/intersystems/persistence/ui/Cache\314\201JdbcConnectionParametersPanel.java"
nothing added to commit but untracked files present (use "git add" to track)
Bass
  • 4,977
  • 2
  • 36
  • 82
  • 1
    "Yes I know I shouldn't, and Unicode in file names is a malpractice, but still I need it" - why? – paxdiablo Sep 18 '13 at 08:12
  • 1
    I'm developing some benchmarking code related to [InterSystems Caché](http://intersystems.com). If the class name doesn't have the acute over e ("cache"), it is read and understood completely differently, so the name w/o the acute is simply misleading. BTW, it looks like I can get rid of the Git problem by adding `/Caché*.java` to `.gitignore`. – Bass Sep 18 '13 at 08:31
  • 1
    “If the class name doesn't have the acute over e ("cache"), it is read and understood completely differently” — please tell me you are kidding… – Holger Sep 18 '13 at 09:02
  • No I'm not. Look, I know all pros and cons of using Unicode in identifiers. I'm developing a purely educational code which will _never_ run in production, and would like my class names accented where appropriate. Of course I will refactor everything if there's no other resort. – Bass Sep 18 '13 at 10:08
  • 1
    HFS+ uses "decomposed" Unicode where the character and the accent are separated. I think it always decomposes the names even if the program uses the more usual "composed" Unicode. – greg-449 Sep 18 '13 at 10:12
  • possible duplicate of [Git and the Umlaut problem on Mac OS X](http://stackoverflow.com/questions/5581857/git-and-the-umlaut-problem-on-mac-os-x) – kostix Sep 18 '13 at 11:11
  • @kostix Thank you for your comment. Yes, part of this question duplicates [Git and the Umlaut problem on Mac OS X](http://stackoverflow.com/questions/5581857/git-and-the-umlaut-problem-on-mac-os-x) indeed. However, I already got Git working using `.gitignore` (see the 2nd comment). I would still prefer to have my filenames in pure ASCII, so this is more of a _Java_ question. – Bass Sep 18 '13 at 11:30
  • I see now--retracted my vote to close. Thanks for clarifying. – kostix Sep 18 '13 at 11:43
  • @Bass: The pros and cons of using Unicode in identifiers are one thing. But I was irritated by the statement that it *must have* the acute to work. A software *requiring* an acute in a class name is something …eh… very special even compared to a software just having an acute in a class name. – Holger Sep 18 '13 at 16:21
  • 1
    @Holger: Okay, let's put it like this: _it mustn't_, but 1. I would like to emphasize this software is related to ISC Caché, and 2. I'm already extremely curious how this can be achieved, even if I wouldn't use this code style in the future. This is the same reason why people have obfuscated code contests. – Bass Sep 18 '13 at 19:50

1 Answers1

2

If you want your names to be ASCII safe, then you could just name your java file as papierM@00e2ch@00e9.java, and ensure that it gets compiled before any other class tries to reference it. This will work, since the <filename>.java does not need to be <classname>.java, however this is common practice, and compiler will not try to compile ClassA from ADifferentFilename.java, for obvious reasons. However, if ADifferentFilename.java is already compiled to ClassA.class, then it will work.

Other than that, you are out of luck with respect to naming your files in pure ASCII.

As an aside, you mention that you have solved the git problem by using a .gitignore file, however you will probably find that a better way to do it would be to enable the precomposeunicode option in git.

git config --global core.precomposeunicode true

If you use this, then you should be able to have your file papierMâché.java and access it from all of Linux, Mac and Windows.

Community
  • 1
  • 1
Paul Wagland
  • 27,756
  • 10
  • 52
  • 74