Why does the JVM allow us to name a function starting with a digit in bytecode?

Question

Identifiers are well defined by The Java Language Specification, Java SE 7 Edition (§3.8)

An identifier is an unlimited-length sequence of Java letters and Java digits, the
first of which must be a Java letter.

As far as I know, since a method name is an identifier, It should be impossible to name a method starting with a digit in java, and javac respects this rule.

So, why does the Java Virtual Machine seem to not respect this rule by allowing us to name a function starting with numbers, in Bytecode?

This simple snippet will actually print the f99() method name and the value of its parameter.

public class Test {
    public static void main(String[] args) {
        Test t = new Test();
        System.out.println(t.f99(100));
    }

    public int f99(int i){
        System.out.println(Thread.currentThread().getStackTrace()[1].getMethodName());
        return i;
    }
}

Compilation and execution:

$ javac Test.java
$ java Test

Output:

f99
100

It is possible to disassemble the code once compiled, and rename all f99 occurences by 99 (with the help of a tool like reJ).

$ java Test

Output:

99
100

So, is the name of the method actually "99"?

Why shouldn't the JVM allow almost any String? Why should it spend extra time checking whether the string for names of fields, methods, classes conform to some pattern? You can have method names which are just a space. — Peter Lawrey, Nov 06 '14 at 23:24
In that case, shouldn't javac allow a method name which is just a space, if it can be executed by the jvm? @peter Lawrey — Pier-Alexandre Bouchard, Nov 06 '14 at 23:36
Why should Java allow space as a method name? `javac` compiles Java code. JVM runs byte code. They are not the same thing. Nor is there any reason for them to be. Java allows you to do some things the JVM does not so it can go both ways. — Peter Lawrey, Nov 06 '14 at 23:43
Do you have an example of something Java can do and that the JVM cannot? @PeterLawrey — Pier-Alexandre Bouchard, Nov 07 '14 at 00:17
@Pier-AlexandreBouchard, inner classes for example, they don't really exist at jvm or reflection level. They are just package private classes with tons of 'bridge' methods (in the outer ones). — bestsss, Nov 07 '14 at 01:12
@Pier-AlexandreBouchard The JVM doesn't allow classes to access private members of another class (nested or not) Java allows you to access private members of a class which has the same outer class. — Peter Lawrey, Nov 07 '14 at 08:34
@bestsss: there might be access methods in the inner class as well. That relationship is bidirectional. — Holger, Nov 07 '14 at 10:55
@Holger, true, although personally I tend not to write private methods (or fields) in the inner classes. — bestsss, Nov 07 '14 at 15:48

score 11 · Accepted Answer · edited Nov 07 '14 at 01:09

11

The Java Language Specification restricts the characters in valid method names so as to help make parsing the Java language unambiguous.

The JVM was designed to be able to support languages other than just Java. As such the restrictions should not be the same; unless we wanted to force all non-Java languages to have the same restrictions. The restrictions chosen for the JVM are the minimal set that permit unambiguous parsing of the method signatures, a format that appears in the JVM spec and not the JLS.

Taken from the JVM Spec

a name must not contain any of the ASCII characters . ; [ / < > :

That is, the following is a valid JVM signatures [Lcom/foo/Bar;, and its special characters have been excluded from method names.

<> was further reserved to separate special JVM methods from application methods, specifically <init> and <clinit>, which are both method names that the JLS does not permit.

edited Nov 07 '14 at 01:09

Antimony

37,781
10
100
107

answered Nov 06 '14 at 23:36

Chris K

11,622
1
36
49

1

Okay, you've piqued my curiosity. What non-Java language(s) was the JVM built to support? – Azar Nov 06 '14 at 23:48
5

@Azar - It wasn't necessarily "built" to support other languages (though there are several others that do run no the JVM). Rather, there was simply no need to restrict the names, and why restrict when there is no need? – Hot Licks Nov 07 '14 at 00:05
1

@azar as I understand the history of Java, the jvm was always thought of as being a platform that could support other languages even though originally Java (or oak) was the only one. Back then it was a design guide, rather than a need. You will find that kind of thinking and separation all over the jvm, for example return types are treated differently by the jvm and jls. Other languages followed fairly quickly, with an explosion more recently. – Chris K Nov 07 '14 at 00:12
"Such a name must not contain any of the ASCII characters . ; [ / < > : [...] but may contain characters that must not appear in an identifier in the Java programming language (JLS §3.8)". Wow, this sentence was the one I was looking for! – Pier-Alexandre Bouchard Nov 07 '14 at 00:12
1

For example, Scala can use such weird identifiers, although in an inconvenient way: https://gist.github.com/v6ak/0412debbcb55c5a1852b – v6ak Dec 08 '14 at 21:30
Interestingly, after trying out the method names by hex editing the class files (and verifying by stack traces), I find that only `<` and `>` cause an error `Exception in thread "main" java.lang.ClassFormatError: Bad method name at constant pool index 41 in class file A`, running Java 1.8.0_102-b14. – Adowrath Mar 21 '17 at 13:09

score 5 · Answer 2 · edited May 23 '17 at 11:54

So, is the name of the method actually "99"?

Real programmers don't use parsers, they use sed:

javac Test.java
sed -i 's/\d003f99/\d00299/' Test.class
java Test

Output:

99
100

This works because we know that the method name is stored in the constant pool as plaintext in a Utf8 entry, and JVMS says that Utf8 entries are of form:

CONSTANT_Utf8_info {
    u1 tag;
    u2 length;
    u1 bytes[length];
}

so we had something like:

01 | 00 03 | 'f' '9' '9'

(identifier 3 bytes long) and the sed command replaced 03 | 'f' '9' '9' with 02 | '9' '9' (now 2 bytes long).

I later checked with javap -v Test.class that sed did what I wanted it to do. Before:

#18 = Utf8               f99

After:

#18 = Utf8               99

Having manually edited, run, decompiled and compared the .class to the JVMS, I can only conclude that the method name must be 99 :-)

So it's just a Java language restriction not present in bytecode.

Why does Java prevent such names?

Likely to make the syntax look like C.

Not starting with digits makes it easier to differentiate identifiers from integer literals for both humans and parsers.

Why does the JVM allow us to name a function starting with a digit in bytecode?

2 Answers2

Linked