java valid identifier from java language specification

Question

Many places on SO lead to the JLS section on Identifiers, but I have a question on what's written there.

The "Java letters" include uppercase and lowercase ASCII Latin letters A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical reasons, the ASCII underscore (_, or \u005f) and dollar sign ($, or \u0024). The $ character should be used only in mechanically generated source code or, rarely, to access pre-existing names on legacy systems. The "Java digits" include the ASCII digits 0-9 (\u0030-\u0039).

But it goes on to say:

Letters and digits may be drawn from the entire Unicode character set, which supports most writing scripts in use in the world today, including the large sets for Chinese, Japanese, and Korean. This allows programmers to use identifiers in their programs that are written in their native languages.

I don't understand how these can both be true. The first section seems to dictate exactly which characters are allowed whereas the second section seems to say that the allowance is much more flexible.

I agree that usage of "includes" instead of "includes but is not limited to" shows that it doesn't exactly contradict. But it also first refers specifically to "Java letters"/"Java digits" and then relaxes this to just "letters"/"digits". My main point is lack of clarity and I wanted confirmation on what I assumed it meant.

Where do you see the contradiction? Supported are latin letters, some signs, numbers and now also some unicode characters. — Tom, Sep 05 '15 at 21:41
Okay. Granted that this is not wrong, but I still think it is misleading/unclear. Do you agree that http://cui.unige.ch/isi/bnf/JAVA/identifier.html correctly and completely represents what the spec says? — lf215, Sep 05 '15 at 21:53

score 1 · Answer 1 · edited May 23 '17 at 12:03

As per the question Legal identifiers in Java you can see that there are many legal identifiers.

[For languages using the roman alphabet] only alphanumeric characters and occasionally underscores are used when naming identifiers by convention. However, a vast array of characters can be used.

The first paragraph refers to the code-style, or convention, among java programmers to use a reasonably consistent and readable naming scheme. The second paragraph you've quoted explains that there are a vast array of other characters which the JVM will accept - although your fellow programmers may disapprove.

score 1 · Answer 2 · answered Nov 02 '15 at 14:13

First section is a special case of the second, and characters mentioned in both the sections have to satisfy the criteria mentioned in JLS 3.8 that is missed here,

A "Java letter" is a character for which the method Character.isJavaIdentifierStart(int) returns true. A "Java letter-or-digit" is a character for which the method Character.isJavaIdentifierPart(int) returns true.

The above methods accept/verify the code points that correspond to the characters in the entire Unicode character set (Section 2) which includes the Basic-Latin character set (Section 1).

Usually, you will never see anybody going beyond the Basic-Latin character set in their Java source files.

java valid identifier from java language specification

2 Answers2