14

I am learning about the conversion of source code to machine code via the .NET and JRE Frameworks. To start off I did some research comparing the two processes and created this diagram. I need some help in criticizing its correctness, and more importantly adding any serious things I missed out to better understand the compilation pathway.

enter image description here

Hovercraft Full Of Eels
  • 283,665
  • 25
  • 256
  • 373
jII
  • 1,134
  • 2
  • 17
  • 29
  • 3
    What do you mean with "assembler" there? As it looks now, no that's wrong: The CLR/JVM does not generate assembly but instead direct machine code. At least the JVM (I don't think CLR) can generate assembly as a byproduct, but that's hardly necessary. – Voo Jun 28 '12 at 21:49
  • @Voo, by assembler I mean a program that will convert human-readable assembly to machine code that the cpu architecture can understand. I do see that this may be entirely redundant in the process. – jII Jun 28 '12 at 22:02
  • 1
    @EJP, Voo is saying that the JVM creates machine code, not the Java compiler which generates byte code. – jII Jun 28 '12 at 22:03
  • 1
    Most modern compilers do not generate humanly readable assembly which are then assembled to machine code, but create the machine code directly. – Thorbjørn Ravn Andersen Jun 28 '12 at 22:04
  • @jesterli Yep than that can be removed. Assembly can be generated in an additional step if necessary (JVM only), but generally we generate native code directly. – Voo Jun 28 '12 at 22:04
  • Would it be correct usage of terminology to say that the CLR/JVM is the 'platform' on which the CIL/bytecode is run? And how would we use the term 'framework'? – jII Jun 28 '12 at 22:09

1 Answers1

16

Both .NET and Java compile down to bytecode, that is an intermediate language which contains instructions for a virtual machine. It's not machine code because it cannot run directly on a physical machine. What happens instead (today at least; Java has a darker history in this regard) is that at runtime a just-in-time compiler is run which translates the VM instructions into native code that is then run directly. This has a major performance benefit over only interpreting it.

They differ in this regard a little. Oracle's Java implementation (Hotspot) uses a clever mix of interpretation, measuring and JIT compiling just the parts that are heavily used and interpreting otherwise. This is to reduce initial impact by the JIT compiler (which needs to run upfront otherwise, lengthening process startup time) while still allowing good performance where needed. .NET on the other hand always JIT-compiles all code that is used (unused code is not compiled, though).

Edit (2019): By now .NET also has tiered compilation where depending on what code runs a lot, that code will be optimized further.

As for a question you mentioned in your comments: Yes, the CLR and the JVM are the platforms such programs are run on. A virtual machine is a machine too, just less hardware-y. They both are tightly integrated with a corresponding framework, the Base Class Library for .NET and the Java class library for Java. Those are frameworks.

Joey
  • 344,408
  • 85
  • 689
  • 683
  • Could you please explain what you mean "In .NET an assembly is the compilation unit"? – jII Jun 28 '12 at 22:14
  • 3
    What you see in Visual Studio in a single project gets compiled to a single `.exe` or `.dll` – the resulting assembly. Compilation unit refers to the smallest compilable unit which gets relevant if you want to do partial recompilation for example. In Java you'd just have to recompile the classes that changed, in .NET you'd have to recompile a whole project. Mind you, the difference is negligible for most cases – compilers for both platforms are blazingly fast, especially compared to C++. – Joey Jun 28 '12 at 22:16
  • +1 for the Java .class recompile. I never realized it, but after looking at the obj folder, apparently .NET don't separate the object result for each class. One can separate the classes into libraries to reduce recompile, but Java explicitly separate each class. – Martheen Jun 29 '12 at 02:47
  • 1
    @Joey you seem to be mixing code compile and JIT compile here. I don't know how JVM Jitter work but it is just not true that CLR has to JIT compile whole assembly at once. JIT, by very definition, means just in time and CLR makes no exception to it. It just-in-time compiles to machine code the code that is to be executed and this almost always is a single method (in fact in highly simplified view CLR maintains a method table which points to jitter and when jitter has compiled the method to machine code, it updates the method table to point to the just compiled machine-code method) – Amit Mittal Jun 29 '12 at 05:42
  • 1
    @Martheen If java classes are packed into a JAR, and if you update a class wouldn't you need to re-generate the JAR. Similarly if you change any class in .Net, the assembly has to re-generated. It is just that .Net does not provide an option to deploy/use classes without a packing. Also visual studio provide an option to build/re-build. Build (intelligently) only compiles what has changed in the source code (and it is far more granular than a compilation unit of class) and then updates and re-packs assembly while re-build re-compiles whole assembly. – Amit Mittal Jun 29 '12 at 05:47
  • Amit, sorry, yes, I was mistaken there. It is fixed now. You don't need to regenerate the JAR, though, you just need to replace classes in it. – Joey Jun 29 '12 at 06:08
  • @Joey as I said, I am not too familiar with Java and hence my assertion about JAR was more in the form of a question :) Also I now fear that I may be a bit off the mark about build/re-build feature but alas I can not edit the comment. – Amit Mittal Jun 29 '12 at 06:14
  • I failed to find official documentation about how exactly it works. And there is an `obj` folder for C# projects, so it might really try intelligently to do so. I just thought I read a post by Eric Lippert where he said that it's always the whole assembly being build but with no proof in one direction or the other I simply removed the statement. – Joey Jun 29 '12 at 06:17
  • @AmitMittal Thanks. So did .NET Build compile in method-level granularity? – Martheen Jun 29 '12 at 08:50
  • @Joey, C++ (.cpp files) can also be executed using the .NET environment. Does this mean it can be both an interpreted language using the compilation method above, and a directly compiled language? – jII Jun 29 '12 at 10:15
  • @Martheen Have not yet found definite reference to visual studio's build/re-build options that can answer that but JITTER most of the time do work at method level granularity – Amit Mittal Jun 29 '12 at 10:25
  • @jesterII as Joey mentioned .Net interprets nothing. It JITs IL into machine code and then machine code directly runs on the processor. It is not entirely correct to say that C++ (or in general unmanaged code) can be executed using .Net. .Net has no role in executing the unmanaged code but (thankfully) it enables loading of unmanaged code and provides services that greatly ease unmanaged and managed interaction. Look at it this way; after JIT, managed code is essentially machine code and can call into other machine code loaded in the same process. .Net on its part greatly simplifies this. – Amit Mittal Jun 29 '12 at 10:35
  • @AmitMittal, thanks for responding. What I was talking about was this .NET version of C++ (http://goo.gl/c6mVR) called C++/CLI. – jII Jun 29 '12 at 11:09