1

I'm Just trying to test Java Unicode support. I found that Java supports Unicode characters in their Class Names. But when I tried to use Unicode fonts It is not compiling. Below is the code

Java Class name with Unicode characters

It Throws below error during Compilation

Error during compilation

The character set of the File and Eclipse workspace is to UTF-8.

Update: Here is the Source. This has Unicode Tamil letters

public class தமிழ் {

    private static String வணக்கம் = "வணக்கம்";

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        வணக்கம்சொல்();
    }

    private static void வணக்கம்சொல்() {
        System.out.println(வணக்கம் + " வருக! வருக!!");
    }
}
Avinash
  • 812
  • 1
  • 8
  • 22
  • works fine for me – Scary Wombat Apr 06 '18 at 07:49
  • 1
    Something's not Unicode friendly somewhere and it's messing things up, converting the unicode chars to the replacement character `?`. – Kayaman Apr 06 '18 at 07:50
  • 1
    Post a [MCVE](https://stackoverflow.com/help/mcve), otherwise all you'll get is "works on my machine". No one cares what you do in your private Eclipse workspace. – Abhijit Sarkar Apr 06 '18 at 07:53
  • 2
    Are you **certain** this error occurs during **compilation**? Looks more like a launch issue if it cannot find `main`. – OldCurmudgeon Apr 06 '18 at 07:55
  • @OldCurmudgeon Error is shown, When I hit Run. – Avinash Apr 06 '18 at 08:18
  • @ScaryWombat If I create a new Class with Only English Letter, It Works fine :( – Avinash Apr 06 '18 at 08:19
  • Most probably only your console font does not support tamil characters. To check if the filename is in UTF-8 open in the file explorer that directory. If the files appears as `தமிழ்.java` the problem is related to your console. see this SO answer for more details https://stackoverflow.com/questions/388490/how-to-use-unicode-characters-in-windows-command-line#47843552 – SubOptimal Apr 06 '18 at 09:10
  • You could use [ConEmu](https://conemu.github.io/en/UnicodeSupport.html) as console (assuming the file shows correct in the file explorer) – SubOptimal Apr 06 '18 at 09:16
  • @SubOptimal How to Integrate with Eclipse and IntelliJ? – Avinash Apr 06 '18 at 10:05
  • Sorry there I cannot help. I avoid using unicode characters in filenames on Windows when a console is involved at some point in the processing. – SubOptimal Apr 06 '18 at 12:43
  • 1
    Please have a look at my last comment in my answer. Seems it works on Windows 10. – SubOptimal Apr 09 '18 at 08:02

2 Answers2

3

A quick demonstration about unicode characters in class names and the hassle on Windows.

Create following Java class file

Main.java

class Main {
    public static void main(String...args) {
        \u0ba4\u0bae\u0bbf\u0bb4\u0bcd.main(new String[0]);
    }
}

class \u0ba4\u0bae\u0bbf\u0bb4\u0bcd {
    public static void main(String[] arrstring) {
        System.out.println("\u0bb5\u0ba3\u0b95\u0bcd\u0b95\u0bae\u0bcd unicode!");
    }
}

All unicode characters are used with the unicode escape notation.

So actually following source would create the same class files

class Main {
        public static void main(String...args) {
                தமிழ்.main(new String[0]);
        }
}

class தமிழ் {
        public static void main(String[] args) {
                System.out.println("வணக்கம் unicode!");
        }
}

Compile the source (the one with the unicode escapes)

javac Main.java

this creates the class files Main.class and தமிழ்.class (you can check the file names e.g. with explorer . in the same directory)

in CMD console the unicode file name cannot be shown

> dir /b *.class
Main.class
?????.class

> java Main
??????? unicode!

in ConEmu the file name is displayed correctly

> dir /b *.class
Main.class
தமிழ்.class

> java Main
??????? unicode!

even the file name தமிழ்.class cannot be shown and accessed correctly in a CMD session, Java is able to execute the class. This means the class is stored correctly with the unicode characters. But the output is broken in both cases.

If you run the above code on a Linux machine the output will be as expected

$ java Main
வணக்கம் unicode!

edit the class with unicode characters can be executed on Linux directly

$ java தமிழ்
வணக்கம் unicode!

edit PowerShell ISE

PS > ls *.class
...
Mode                LastWriteTime     Length Name                                                                                                  
----                -------------     ------ ----                                                                                                  
-a---        08/04/2018     12:34        317 Main.class                                                                                            
-a---        08/04/2018     12:34        443 தமிழ்.class                                                                                           

PS > java Main
??????? unicode!

PS > java தமிழ்
java : Error: Could not find or load main class ?????
At line:1 char:1
+ java தமிழ்

edit Related to this bug report on Eclipse it seems it's working on Windows 10 (which I cannot verify, don't have one)

SubOptimal
  • 22,518
  • 3
  • 53
  • 69
  • You dodged the actual problem of launching a main class containing special characters. The console output is fixable. Invoke `mode con cp select=65001`, followed by `java -Dfile.encoding=UTF-8 Main`. Then, all unicode characters are transferred correctly (though you might need an appropriate font to display them). But still, launching the class directly via `java -Dfile.encoding=UTF-8 தமிழ்` does not work. And you didn’t say whether that works with Linux. – Holger Apr 06 '18 at 13:01
  • @Holger The wrapping of the execution of class `தமிழ்` was done to show that it's not a general problem on Windows. It's rather a problem of the console (CMD), which lacks some unicode support. As you said even with `mode con cp select=65001` the call of `java -Dfile.encoding=UTF-8 தமிழ்` still doesn't work on Windows. But it works as expected on Linux, I added this to my answer. – SubOptimal Apr 06 '18 at 14:05
  • Mildly curious whether launching windows-java from bash-on-windows or powershell would work any better than cmd – the8472 Apr 06 '18 at 18:23
  • @the8472 I added a PowerShell ISE example. – SubOptimal Apr 09 '18 at 06:41
0

It is a matter of:

  • Unicode text normalisation: ĉ could be one Unicode code point (symbol) or two c and a combining diacritical mark ^ (zero-width). The operating system uses one of them. Ideally the IDE should enforce a canonical form. (No idea.)
  • Windows command line cmd.exe is restricted to its system encoding. However you could have a pure ASCII main class, calling the main of your class.
  • An executable jar file with an ASCII name should also pose no problem. The MANIFEST.MF is already in UTF-8, but as the line length should not exceed 72 bytes, and UTF-8 is multibyte per char, be careful.

  • Then there are version control systems that can make problems. Especially try switching between Windows and Linux.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138