8

Whole-program compilers like MLton create optimized binaries in part to their ability to use the total source of the binary to perform partial evaluation: aggressively inlining constants and evaluating them until stuck—all during compilation!

This has been explored public ally a bit in the Haskell space by Gabriel Gonzalez's Morte.

Now my understanding is that Haskell does not do very much of this—if any at all. The cited reason I understand is that it is antithetical to separate compilation. This makes sense to prohibit partial evaluation across source-file boundaries, but it seems like in-file partial evaluation would still be an option.

As far as I know, in-file partial evaluation is still not performed, though.

My question is: is this true? If so, what are the tradeoffs for performing in-file partial evaluation? If not, what is an example file where one can improve compiled performance by putting more functionality into the same file?

(Edit: To clarify the above, I know there are a lot of questions as to what the best set of reductions to perform are—many are undecidable! I'd like to know the tradeoffs made in an "industrial strength" compiler with separate compilation that live at a level above choosing the right equational theory if there are any interesting things to talk about there. Things like compilation speed or file bloat are more toward the scope I'm interested in. Another question in the same space might be: "Why can't MLton get separate compilation just by compiling each module separately, leaving the API exposed, and then linking them all together?")

J. Abrahamson
  • 72,246
  • 9
  • 135
  • 180
  • 1
    The thing about partial evaluation is that it suffers from the halting problem. When do you decide to stop partially evaluating an expression? For example, take the expression `enumFrom 0 :: [Integer]`. If you try to partially evaluate it then your compiler will never terminate, which is a very bad thing. One solution would be to not evaluate expressions in weak head normal form. However this would mean that certain expressions would not be optimizable. Rewriting is a quick and dirty way to get away with deforestation of intermediate data structures. Plus it terminates. Problem is correctness. – Aadit M Shah Nov 21 '14 at 18:06
  • 1
    The Glasgow Haskell Compiler does perform inlining, although not aggressively: http://stackoverflow.com/q/26996110/783743. It seems that a lot of people are taking an interest in aggressive inlining: http://stackoverflow.com/questions/26827663/rewriting-as-a-practical-optimization-technique-in-ghc-is-it-really-needed. My final year Bachelor's project is on rewriting and partial evaluation in functional programming languages. – Aadit M Shah Nov 21 '14 at 18:11
  • 2
    I should add to the question to be more clear. I'm aware that lots of equalities here are undecidable (and it may be worse in Haskell than ML), but MLton is an (partial) existence proof that there's some reduction equality which is both beneficial and terminating. Without examining too deeply as to what the right reductions to perform are, I'd like to get at the reasoning for implementing them in an "industrial strength" compiler with separate compilation. – J. Abrahamson Nov 21 '14 at 18:13
  • Although I'm perhaps being naive to think that these questions are separable. GHC's inlining and rewrite rules are, of course, some large piece of partial evaluation. The word on the street appears to be that these are a far cry from what MLton attempts, however. Perhaps another way to improve the question would be "Are there any partial evaluation steps that MLton performs which cannot be replicated in GHC for either 'practical' reasons or separate compilation reasons?" – J. Abrahamson Nov 21 '14 at 18:18
  • 1
    It doesn't have to be undecidable to be trouble. Taken to the extreme, any program of the type "Calculate this mathematical expression and print it" will eventually become `main = print 42`. Of course this also would be undesirable for, say, a large lattice calculation. But I guess most programs aren't of this type, I would guess it would be more useful than not, especially if a flag could turn it off. – jamshidh Nov 21 '14 at 18:19
  • @jamshidh I don't see the problem with that as far as correctness goes... but the "slow compilation" problem in your lattice computation is sort of exactly what I'm looking for more insight into. Are there ways to speculatively partially evaluate and then retract it if there doesn't seem to be any value in blowing up that particular part of the AST? – J. Abrahamson Nov 21 '14 at 18:22
  • 1
    @jamshidh That is considering that the program doesn't take any external input. Partial evaluation, by definition, only evaluates the static known expressions in a program at compile time. – Aadit M Shah Nov 21 '14 at 18:23
  • @AaditMShah- I agree, but back when I was in physics, I saw my share of self contained data programs (all written in Fortran of course :) ). – jamshidh Nov 21 '14 at 18:26
  • Another problem that I see with aggressive inlining is that it can't be used everywhere. For example take the `foldr/build` technique. From the structure of the code itself it is impossible to determine that `foldr c n (build g) = g c n`. You need to look at the type signature to figure out what it means. This is because the at the value level there's not enough information to optimize the code. You need to look at the types to make the deduction. Of course you would also need to check if your deduction matches the value level "proofs". – Aadit M Shah Nov 21 '14 at 18:27
  • one straightforward and *costly* way to deal with this is to run multiple instances of compiler with progressively increasing timeouts, producing a sequence of executables, from least to most pre-calculated. The tasks to perform the more pre-calculating compiles would be given smaller and smaller priorities, if computing resources become scarce, while the user could run, and thus explore the efficiency of the executables, as they become available. – Will Ness Nov 23 '14 at 16:35

1 Answers1

5

This is definitely an optimization that a small set of people are interested in and are pursuing. The Google search term to find information on it is "supercompilation". I believe there are at least two approaches floating about at the moment.

It seems one of the big tradeoffs is compilation-time resources (time and memory both), and at the moment the performance wins of paying these costs appear to be somewhat unpredictable. There's quite some work left. A few links:

Daniel Wagner
  • 145,880
  • 9
  • 220
  • 380