30

So I am a little confused regarding the verification of bytecode that happens inside a JVM. According to the book by Deitel and Deitel, a Java program goes through five phases (edit, compile, load, verify and execute) (chapter 1). The bytecode verifier verifies the bytecode during the 'verify' stage. Nowhere does the book mention that the bytecode verifier is a part of the classloader.

However according to docs of oracle , the classloader performs the task of loading, linking and initialization, and during the process of linking it has to verify the bytecode.

Now, are the bytecode verification that Deitel and Deitel talks about, and the bytecode verification that this oracle document talks about, the same process?

Or does bytecode verification happen twice, once during the linking process and the other by the bytecode verifier?

Picture describing phases of a java program as mentioned in book by Dietel and Dietel.(I borrowed this pic from one of the answers below by nobalG :) ) enter image description here

nobalG
  • 4,544
  • 3
  • 34
  • 72
Smrita
  • 1,259
  • 3
  • 18
  • 38

5 Answers5

21

You may understand the byte code verification using this diagram which is in detail explained in Oracle docs

enter image description here

You will find that the byte code verification happens only once not twice

The illustration shows the flow of data and control from Java language source code through the Java compiler, to the class loader and bytecode verifier and hence on to the Java virtual machine, which contains the interpreter and runtime system. The important issue is that the Java class loader and the bytecode verifier make no assumptions about the primary source of the bytecode stream--the code may have come from the local system, or it may have travelled halfway around the planet. The bytecode verifier acts as a sort of gatekeeper: it ensures that code passed to the Java interpreter is in a fit state to be executed and can run without fear of breaking the Java interpreter. Imported code is not allowed to execute by any means until after it has passed the verifier's tests. Once the verifier is done, a number of important properties are known:

  • There are no operand stack overflows or underflows
  • The types of the parameters of all bytecode instructions are known to always be correct
  • Object field accesses are known to be legal--private, public, or protected

While all this checking appears excruciatingly detailed, by the time the bytecode verifier has done its work, the Java interpreter can proceed, knowing that the code will run securely. Knowing these properties makes the Java interpreter much faster, because it doesn't have to check anything. There are no operand type checks and no stack overflow checks. The interpreter can thus function at full speed without compromising reliability.

EDIT:-

From Oracle Docs Section 5.3.2:

When the loadClass method of the class loader L is invoked with the name N of a class or interface C to be loaded, L must perform one of the following two operations in order to load C:

  • The class loader L can create an array of bytes representing C as the bytes of a ClassFile structure (§4.1); it then must invoke the method defineClass of class ClassLoader. Invoking defineClass causes the Java Virtual Machine to derive a class or interface denoted by N using L from the array of bytes using the algorithm found in §5.3.5.
  • The class loader L can delegate the loading of C to some other class loader L'. This is accomplished by passing the argument N directly or indirectly to an invocation of a method on L' (typically the loadClass method). The result of the invocation is C.

As correctly commented by Holger, trying to explain it more with the help of an example:

static int factorial(int n)
{
int res;
for (res = 1; n > 0; n--) res = res * n;
return res;
}

The corresponding byte code would be

method static int factorial(int), 2 registers, 2 stack slots
0: iconst_1 // push the integer constant 1
1: istore_1 // store it in register 1 (the res variable)
2: iload_0 // push register 0 (the n parameter)
3: ifle 14 // if negative or null, go to PC 14
6: iload_1 // push register 1 (res)
7: iload_0 // push register 0 (n)
8: imul // multiply the two integers at top of stack
9: istore_1 // pop result and store it in register 1
10: iinc 0, -1 // decrement register 0 (n) by 1
11: goto 2 // go to PC 2
14: iload_1 // load register 1 (res)
15: ireturn // return its value to caller

Note that most of the instructions in JVM are typed.

Now you should note that proper operation of the JVM is not guaranteed unless the code meets at least the following conditions:

  • Type correctness: the arguments of an instruction are always of the types expected by the instruction.
  • No stack overflow or underflow: an instruction never pops an argument off an empty stack, nor pushes a result on a full stack (whose size is equal to the maximal stack size declared for the method).
  • Code containment: the program counter must always point within the code for the method, to the beginning of a valid instruction encoding (no falling off the end of the method code; no branches into the middle of an instruction encoding).
  • Register initialization: a load from a register must always follow at least one store in this register; in other terms, registers that do not correspond to method parameters are not initialized on method entrance, and it is an error to load from an uninitialized register.
  • Object initialization: when an instance of a class C is created, one of the initialization methods for class C (corresponding to the constructors for this class) must be invoked before the class instance can be used.

The purpose of byte code verification is to check these condition once and for all, by static analysis of the byte code at load time. Byte code that passes verfification can then be executed faster.

Also to note that byte code verification purpose is to shift the verfification listed above from run time to load time.

The above explanation has been taken from Java bytecode verification: algorithms and formalizations

Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
  • One quick question. Which *Class Loader(s)*?. Are only custom class loaders subjected to this verification? – TheLostMind Aug 28 '14 at 06:26
  • @TheLostMind:- I think it is not specific to any particular class loader, the bytecode verification applies to all class files. – Rahul Tripathi Aug 28 '14 at 06:34
  • 6
    @TheLostMind: this is a simplifying illustration. Actually, verification does *not* happen within the `ClassLoader` and therefore is completely independent from the particular `ClassLoader` implementation. There are even other ways to add a class to a JVM, e.g. Instrumentation, but the byte code will be verified in these cases as well. Also, the arrow from “Class Loader” to “Just in Time Compiler” makes no sense as a `ClassLoader` does not interact with the JIT Compiler in any way. Rather, you can consider verifier and the JIT being an integral part of the JVM for more than fifteen years now. – Holger Aug 28 '14 at 09:20
  • @Holger -*Also, the arrow from “Class Loader” to “Just in Time Compiler” makes no sense* - I wondered the same. I don't know how Oracle documentation can be so *misleading*. – TheLostMind Aug 28 '14 at 09:21
  • @TheLostMind: Look at the bottom of [the source page](http://www.oracle.com/technetwork/java/security-136118.html): “*Copyright © 1997 Sun Microsystems*”. It was made at Java 1.1 times. Going from that page to the table of contents even reveals “May 1996”. Oracle hosts a lot of old stuff and you have to be carefully when following a direct link. I’m not even sure whether you can reach the page from their front page (from something other than an “archive” link). – Holger Aug 28 '14 at 09:24
  • @Holger - ah.. My bad. I should have looked at it back in 1997 :P – TheLostMind Aug 28 '14 at 09:25
  • @TheLostMind you are right the picture is misleading! – Smrita Aug 28 '14 at 14:57
  • @Holger so I mean is bytecode verifier not a part of Classloader? – Smrita Aug 28 '14 at 15:02
  • @Smrita - check [this link](http://www.informit.com/articles/article.aspx?p=1187967&seqNum=2). It might be helpful. – TheLostMind Aug 28 '14 at 15:20
  • 1
    @Smrita: The `ClassLoader` is responsible for locating and loading (or generating) the bytes that make up a class file. Its responsibility ends when it passes these bytes to one of the [`defineClass`](http://docs.oracle.com/javase/7/docs/api/java/lang/ClassLoader.html#defineClass(java.lang.String,%20byte[],%20int,%20int)) methods. That’s the point where the responsibility of the JVM and its verifier *starts*.The process is specified in the [JVM spec §5.3](http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.3).Note that 5.3.2 contains a remark about the Java1.1 changes (1997). – Holger Aug 28 '14 at 16:15
  • Hey @Holger I am so confused I mean the oracle docs http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.3 has mentioned about verification under the topic linking.I mean isnot linking one of the tasks that class loader performs? Could you please point out proper resources where I can read about the things that you are saying:). – Smrita Aug 29 '14 at 01:02
  • 2
    That chapter *is* the proper resource. As said in my previous comment, §5.3.2 contains a remark about relevant Java 1.1 changes. Let me cite: “*From JDK release 1.1 onward, Oracle’s Java Virtual Machine implementation links the class or interface directly, without relying on the class loader.*” – Holger Aug 29 '14 at 08:15
9

No.

From the JVM Spec 4.10:

Even though a compiler for the Java programming language must only produce class files that satisfy all the static and structural constraints in the previous sections, the Java Virtual Machine has no guarantee that any file it is asked to load was generated by that compiler or is properly formed.

And then proceeds specify the verification process.

And JVM Spec 5.4.1:

Verification (§4.10) ensures that the binary representation of a class or interface is structurally correct (§4.9). Verification may cause additional classes and interfaces to be loaded (§5.3) but need not cause them to be verified or prepared.

The section specifying linking references §4.10 - not as a separate process but part of loading the classes.

The JVM and JLS are great documents when you have a question like this.

jdphenix
  • 15,022
  • 3
  • 41
  • 74
9

No such Two time verification

NO, As far as verification is concerned,look closely that how the program written in java goes through various phases in the following image,You will see that there is no such Two time verification but the code is verified just once.

enter image description here

  • EDIT – The programmer writes the program (preferably on a notepad) and saves it as a ‘.java’ file, which is then further used for compilation, by the compiler.
  • COMPILE – The compiler here takes the ‘.java’ file, compiles it and looks for any possible errors in the scope of the program. If it finds any error, it reports them to the programmer. If no error is there, then the program is converted into the bytecode and saved as a ‘.class’ file.

  • LOAD – Now the major purpose of the component called ‘Class Loader’ is to load the byte code in the JVM. It doesn’t execute the code yet, but just loads it into the JVM’s memory.

  • VERIFY – After loading the code, the JVM’s subpart called ‘Byte Code verifier’ checks the bytecode and verifies it for its authenticity. It also checks if the bytecode has any such code which might lead to some malicious outcome. This component of the JVM ensures security.

  • EXECUTE – The next component is the Execution Engine. The execution engine interprets the code line by line using the Just In Time (JIT) compiler. The JIT compiler does the execution pretty fast but consumes extra cache memory.

nobalG
  • 4,544
  • 3
  • 34
  • 72
  • 2
    This is the diagram that has been mentioned in Dietel and Dietel. Nowhere does it talk about bytecode verifier being a part of classloader!!Even this diagram is not clear regarding it.This diagram is the main reason for my confusion!! – Smrita Aug 28 '14 at 06:14
  • See this too http://stackoverflow.com/questions/755005/how-does-bytecode-get-verified-in-the-jvm – nobalG Aug 28 '14 at 06:20
5

The spec lists 4 phases in bytecode verification. These steps are functionally distinct, not to be mistaken with repeating the same thing. Just like a multi-pass compiler uses each pass to setup for the next pass, phases are not repetition, but are orchestrated for a single overall purpose, each phase accomplishes certain tasks.

Unless the bytecode is changed, there is no reason to verify it twice.

The verification is described here.

http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.10

codenheim
  • 20,467
  • 1
  • 59
  • 80
2

Verification of code happens twice. Once during compilation (compilation fails if the code has flaws, threats) and again after the class is loaded into memory during execution (actual byte-code verification happens here). Yes, this happens along with the process of loading classes (by class loaders), but the class loaders themselves might not act as verifiers. Its the JVM (or rather the verifier present in the JVM) that does the verification.

TheLostMind
  • 35,966
  • 12
  • 68
  • 104
  • So you mean there's something in the compiler that has the ability to verify bytecodes? Could you please point out resources so that I can read it too:) – Smrita Aug 29 '14 at 01:05
  • @Smrita - check [this](http://en.wikipedia.org/wiki/Java_virtual_machine) and [this](http://www.informit.com/articles/article.aspx?p=1187967&seqNum=2). BTW I edited my answer to make it clearer. *Bytecode* verification doesn't happen twice. The compiler ensures that *bad* code always fails. So, this is indeed verification, but not on bytecode. The JVM has a verifier that does *bytecode verification*. – TheLostMind Aug 29 '14 at 05:25
  • Its somewhat clear now. So seems like verification of byte code happens only once:) – Smrita Aug 29 '14 at 06:19
  • @Smrita - Yes. Seems like that. Unfortunately the available documentation on this topic is either *outdated* or *too less*. – TheLostMind Aug 29 '14 at 07:01