How does Java identify if a location holds primitive or reference

Question

If there is an object on the heap, and that object has a few instance variables, some are primitive types and few are other objects. So, if this object has say 5 fields, how is the object structured in memory? To be specific... where does Java store the data type of each field? Are there some 'flag bytes' and some 'data bytes' where the 'flag bytes' identify the data type of the next few 'data bytes'?

I am referring to some additional details proceeding beyond this answer: https://stackoverflow.com/a/19623603/1364747

This answer throws lot more detail of how the data itself is stored in memory: https://stackoverflow.com/a/1907455/1364747

But it still doesn't say where the flag is stored which says that the data type is int/long/double/float/reference.

Does this refer to the in-memory representation? I think that this is largely left to the implementation. In fact, I think that once the JVM has loaded a class file that was determined to be *valid*, strictly speaking, it does not have to "store" this information together with the actual instance data. It's simply not relevant for the execution. When the instance is introspected (e.g. with reflection), then the information may be looked up in the class data. And ... there it is: https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-FieldType — Marco13, Aug 02 '16 at 17:53
I don't know the answer to this question, but in assembly language, wheather a piece of data is interpreted as a pointer, or as face value, is determined by the type of instructions that act upon that data. Extending this to Java, I would guess that when Java compiles the code to bytecode, it compiles bytecode instructions to treat Reference types as pointers to other memory locations, and primitive types as face-values. This is a complete guess though. — jmrah, Aug 02 '16 at 17:57
Okay. So, roughly JVM implementation could store 20 bytes of contiguous data in memory, and which byte should be interpreted as int and which bytes should be interpreted as double-part1 and double-part2 would be referred from the class? Or it could be handled in JVM specific way for performance.. ok. — Teddy, Aug 02 '16 at 17:59
@jrahhali That is very useful for me to understand. It may not be in memory at runtime at all, because the byte codes or assembly commands generated by compiler could be relevant/specific for that variable type. — Teddy, Aug 02 '16 at 18:06

score 1 · Accepted Answer · answered Aug 02 '16 at 18:17

Here is a more concrete answer, that I am afraid still doesn't answer all of your question. Here is the link from the java 7 docs, with the relevant section being "2.11. Instruction Set Summary": https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html

I will copy & paste some of it:

2.11.1. Types and the Java Virtual Machine

Most of the instructions in the Java Virtual Machine instruction set encode type information about the operations they perform. For instance, the iload instruction (§iload) loads the contents of a local variable, which must be an int, onto the operand stack. The fload instruction (§fload) does the same with a float value. The two instructions may have identical implementations, but have distinct opcodes.

For the majority of typed instructions, the instruction type is represented explicitly in the opcode mnemonic by a letter: i for an int operation, l for long, s for short, b for byte, c for char, f for float, d for double, and a for reference.

2.11.2. Load and Store Instructions

The load and store instructions transfer values between the local variables (§2.6.1) and the operand stack (§2.6.2) of a Java Virtual Machine frame (§2.6).

Instructions that access fields of objects and elements of arrays (§2.11.5) also transfer data to and from the operand stack.

There's a lot more there. Interesting read.

Thanks. That makes it so much clear. I've always had this as a gray area. I'll go through the docs as well to get more understanding. — Teddy, Aug 02 '16 at 18:21

score 1 · Answer 2 · answered Aug 02 '16 at 18:31

Type information is only needed at compile time to generate the correct bytecode. Bytecode instructions (like assembly instructions) can typically act on only one datatype. Thus, the instruction used reflects the type of the operands. This is true for most C-family languages.

To see in action how the bytecode would differ when using a primitive and a dynamic allocation, let's take a simple example.

public static void main (String [] args) {
    int i = 0;
    int j = i + 1;
}

And the bytecode generated:

public static void main(java.lang.String[]);
  Code:
     0: iconst_0
     1: istore_1
     2: iload_1
     3: iconst_1
     4: iadd
     5: istore_2
     6: return

So we store and load the integers using istore and iload, and then we add them using iadd (i for integer).

Now take this example, using a dynamic memory allocation instead of a primitive:

public static void main (String [] args) {
    Integer i = new Integer(0);
    int j = i + 1;
}

And the bytecode:

public static void main(java.lang.String[]);
  Code:
     0: new           #2                  // class java/lang/Integer
     3: dup
     4: iconst_0
     5: invokespecial #3                  // Method java/lang/Integer."<init>":(I)V
     8: astore_1
     9: aload_1
    10: invokevirtual #4                  // Method java/lang/Integer.intValue:()I
    13: iconst_1
    14: iadd
    15: istore_2
    16: return

In this version, we first have to invoke the intValue() method of the Integer object to retrieve the value, and then we can act on it via iadd.

And for evidence that datatypes need not be stored after compilation (since they are encoded in the instructions themselves, like istore for "integer store"), see the reference in jrahhali's answer.

How does Java identify if a location holds primitive or reference

2 Answers2