Here are the phases of the scala compiler, along with slightly edited
versions of their comments from the source code. Note that this
compiler is unusual in being heavily weighted towards type checking
and to transformations that are more like desugarings. Other compilers
include a lot of code for: optimization, register allocation, and
translation to IR.
Some top-level points:
There is a lot of tree rewriting. Each phase tends to read in a tree
from the previous phase and transform it to a new tree. Symbols, to
contrast, remain meaningful throughout the life of the compiler. So
trees hold pointers to symbols, and not vice versa. Instead of
rewriting symbols, new information gets attached to them as the phases
progress.
Here is the list of phases from Global:
analyzer.namerFactory: SubComponent,
analyzer.typerFactory: SubComponent,
superAccessors, // add super accessors
pickler, // serializes symbol tables
refchecks, // perform reference and override checking,
translate nested objects
liftcode, // generate reified trees
uncurry, // uncurry, translate function values to anonymous
classes
tailCalls, // replace tail calls by jumps
explicitOuter, // replace C.this by explicit outer pointers,
eliminate pattern matching
erasure, // erase generic types to Java 1.4 types, add
interfaces for traits
lambdaLift, // move nested functions to top level
constructors, // move field definitions into constructors
flatten, // get rid of inner classes
mixer, // do mixin composition
cleanup, // some platform-specific cleanups
genicode, // generate portable intermediate code
inliner, // optimization: do inlining
inlineExceptionHandlers, // optimization: inline exception handlers
closureElimination, // optimization: get rid of uncalled closures
deadCode, // optimization: get rid of dead cpde
if (forMSIL) genMSIL else genJVM, // generate .class files
some work around with scala compiler
Thus scala compiler has to do a lot more work than the Java compiler, however in particular there are some things which makes the Scala compiler drastically slower, which include
- Implicit resolution. Implicit resolution (i.e. scalac trying to find an implicit value when you make an implicit declartion) bubbles up over every parent scope in the declaration, this search time can be massive (particularly if you reference the same the same implicit variable many times, and its declared in some library all the way down your dependancy chain). The compile time gets even worse when you take into account implicit trait resolution and type classes, which is used heavily by libraries such as scalaz and shapeless.
Also using a huge number of anonymous classes (i.e. lambdas, blocks, anonymous functions).Macros obviously add to compile time.
A very nice writeup by Martin Odersky
Further the Java and Scala compilers convert source code into JVM bytecode and do very little optimization.On most modern JVMs, once the program bytecode is run, it is converted into machine code for the computer architecture on which it is being run. This is called the just-in-time compilation. The level of code optimization is, however, low with just-in-time compilation, since it has to be fast. To avoid recompiling, the so called HotSpot compiler only optimizes parts of the code which are executed frequently.
A program might have different performance each time it is run. Executing the same piece of code (e.g. a method) multiple times in the same JVM instance might give very different performance results depending on whether the particular code was optimized in between the runs. Additionally, measuring the execution time of some piece of code may include the time during which the JIT compiler itself was performing the optimization, thus giving inconsistent results.
One common cause of a performance deterioration is also boxing and unboxing that happens implicitly when passing a primitive type as an argument to a generic method and also frequent GC.
There are several approaches to avoid the above effects during measurement,like It should be run using the server version of the HotSpot JVM, which does more aggressive optimizations.Visualvm is a great choice for profiling a JVM application. It’s a visual tool integrating several command line JDK tools and lightweight profiling capabilities.However scala abstracions are very complex and unfortunately VisualVM does not yet support this.parsing mechanisms which was taking a long time to process like cause using a lot of exists
and forall
which are methods of Scala collections which take predicates,predicates to FOL and thus may pass entire sequence maximizing performance.
Also making the modules cohisive and less dependent is a viable solution.Mind that intermediate code gen is somtimes machine dependent and various architechures give varied results.
An Alternative:Typesafe has released Zinc which separates the fast incremental compiler from sbt and lets the maven/other build tools use it. Thus using Zinc with the scala maven plugin has made compiling a lot faster.
A simple problem: Given a list of integers, remove the greatest one. Ordering is not necessary.
Below is version of the solution (An average I guess).
def removeMaxCool(xs: List[Int]) = {
val maxIndex = xs.indexOf(xs.max);
xs.take(maxIndex) ::: xs.drop(maxIndex+1)
}
It's Scala idiomatic, concise, and uses a few nice list functions. It's also very inefficient. It traverses the list at least 3 or 4 times.
Now consider this , Java-like solution. It's also what a reasonable Java developer (or Scala novice) would write.
def removeMaxFast(xs: List[Int]) = {
var res = ArrayBuffer[Int]()
var max = xs.head
var first = true;
for (x <- xs) {
if (first) {
first = false;
} else {
if (x > max) {
res.append(max)
max = x
} else {
res.append(x)
}
}
}
res.toList
}
Totally non-Scala idiomatic, non-functional, non-concise, but it's very efficient. It traverses the list only once!
So trade-offs should also be prioritized and sometimes you may have to work things like a java developer if none else.