3

I had a friend say:

For me the most interesting thing about Haskell is not the language and the types. It is the Spineless Tagless Graph Machine behind it.

Because Haskell people talk about types all the time, this quote really caught my attention. Now we can look at the Haskell compilation process like this:

  1. Parsing
  2. Type checking
  3. Desugaring + a few bobs and bits
  4. Translation to core
  5. Lion share of optimization
  6. Translation to STG language
  7. STG language to C–
  8. C– to assembly or llvm

Which we can simplify down to:

  1. .. front end stuff ..
  2. Translate IL to STG language
  3. Compile STG language to C/ASM/LLVM/Javascript

Ie - there is an intermediate 'graph language' that Haskell is compiled to, and various optimisations happen there, prior to it being compiled to LLVM/C etc.

This contrasts to a potential JVM Language compilation process that looks like this:

  1. Convert JVM Language Code to Java bytecode inside a class.
  2. Run the Bytecode on a Java Virtual Machine.

Assuming it were possible to add an intermediate STG Compilation step to the Java Compilation process, I'm wondering what impact would this change have? What would change about the compiled code?

(I'm aware that you need a pure functional language to get the most use out of the spineless tagless graph machine, so if it is helpful to answer the question, assume we're compiling Frege [Haskell for the JVM].)

My question is: What would change if the JVM Language Compilation process had an STG phase like Haskell?

Marimuthu Madasamy
  • 13,126
  • 4
  • 30
  • 52
hawkeye
  • 34,745
  • 30
  • 150
  • 304
  • 2
    Your version of the 'Java compilation process' bears no relationship with reality. It isn't on the same level as your version of the Haskell compilation process, whether accurate or not. Compilation via any process involves screening, scanning, parsing, semantic analysis, allocation, flow analysis, optimization, code generation, ... and maybe a lot of other things as well. And execution isn't part of compilation. – user207421 Jan 01 '16 at 09:36
  • 1
    Note that `javac` (and afaik all other java compilers) do very few optimizations when going from Java code to bytecode. All optimizations are deferred to the JIT compiler which takes bytecode to machine code. So your question would perhaps be better phrased if you referred to the JIT compilation process. – aioobe Jan 01 '16 at 09:39
  • That's helpful feedback. I've updated the question to remove the issues you've talked about. – hawkeye Jan 01 '16 at 09:53
  • 3
    As far as I understand, running the STG machine efficiently requires some low-level tricks, which the JVM does not allow. For instance, every closure needs two pointers: one for an information table, and another one for the actual entry code. GHC optimizes this by placing the read-only info table right before the entry point, so that only one pointer is needed per closure. – chi Jan 01 '16 at 10:36
  • There is no such thing as **THE** JVM language compilation process. – Ingo Jan 03 '16 at 22:00

1 Answers1

1

You need to clarify if you mean Java the language or some language running on the JVM.

My knowledge of Java the language is limited to having read the specification, and I know nothing about the Haskell IR you're talking about. However, Java is, by spec, a dynamic language, and it would be illegal to perform any AOT xform which uses any information outside of each end classfile.

Of course a project that doesn't use these features could break these rules.

pat
  • 12,587
  • 1
  • 23
  • 52
MB Reynolds
  • 619
  • 5
  • 9