Why shouldn't one make every Scala instance variable a lazily initialized one?

Question

Except from the added verbosity are there any other strong reasons why one shouldn't state that every instance variable should be lazily initialized ?

As for the hidden costs: http://stackoverflow.com/q/3041253/2390083 — Beryllium, Jun 26 '13 at 19:11
And the hidden costs are going to be getting worse: http://stackoverflow.com/a/17329465/175251 — Leif Wickland, Jun 26 '13 at 20:07

Rüdiger Klaehn · Accepted Answer · 2013-06-26T20:49:01.287

First of all: if something goes wrong in the initalization of a lazy val (like accessing an external resource that does not exist), you will only notice it the first time you access the val, whereas with a normal val you will notice as soon as the object is being constructed. You can also have cyclic dependencies in lazy vals which will lead to the class not working at all (one of the dreaded NullPointerExceptions), but you will only find out the first time you access one of the connected lazy vals.

So lazy vals make the program less deterministic, which is always a bad thing.

Second: There is a runtime overhead involved with a lazy val. A lazy val is currently implemented by a private bitmask (int) in a class using lazy vals (one bit for each lazy val, so if you have more than 32 lazy vals there will be two bitmasks etc.)

To make sure that the lazy val initializer will only be run exactly once, there is a synchronized write to the bitmask when the field is initialized and a volatile read every time the field is accessed. Now a volatile read is pretty cheap on the x86 architecture, but a volatile write can be really expensive.

As far as I know there is an effort underway to optimize this in a future version of scala, but there will always be an overhead to check if the field is initialized compared to a straight val access. For example the extra code for lazy val access might prevent a method from being inlined.

Of course for a very small class the memory overhead of the bitmask might also be relevant.

But even if you don't have any performance problems, it is good to figure out the order in which vals depend on each other and just sort them in that order and use normal vals.

Edit: here is a code example that illustrates the nondeterminism you might get if you use lazy vals:

class Test {
  lazy val x:Int = y
  lazy val y:Int = x
}

You can create an instance of this class without any problems, but as soon as you access either x or y you will get a StackOverflow. This is of course an artificial example. In the real world you have much longer and non-obvious dependency cycles.

Here is a scala console session using :javap that illustrates the runtime overhead of a lazy val. First a normal val:

scala> class Test { val x = 0 }
defined class Test

scala> :javap -c Test
Compiled from "<console>"
public class Test extends java.lang.Object implements scala.ScalaObject{
public int x();
  Code:
   0:   aload_0
   1:   getfield    #11; //Field x:I
   4:   ireturn

public Test();
  Code:
   0:   aload_0
   1:   invokespecial   #17; //Method java/lang/Object."<init>":()V
   4:   aload_0
   5:   iconst_0
   6:   putfield    #11; //Field x:I
   9:   return

}

And now the lazy val:

scala> :javap -c Test
Compiled from "<console>"
public class Test extends java.lang.Object implements scala.ScalaObject{
public volatile int bitmap$0;

public int x();
  Code:
   0:   aload_0
   1:   getfield    #12; //Field bitmap$0:I
   4:   iconst_1
   5:   iand
   6:   iconst_0
   7:   if_icmpne   45
   10:  aload_0
   11:  dup
   12:  astore_1
   13:  monitorenter
   14:  aload_0
   15:  getfield    #12; //Field bitmap$0:I
   18:  iconst_1
   19:  iand
   20:  iconst_0
   21:  if_icmpne   39
   24:  aload_0
   25:  iconst_0
   26:  putfield    #14; //Field x:I
   29:  aload_0
   30:  aload_0
   31:  getfield    #12; //Field bitmap$0:I
   34:  iconst_1
   35:  ior
   36:  putfield    #12; //Field bitmap$0:I
   39:  getstatic   #20; //Field scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;
   42:  pop
   43:  aload_1
   44:  monitorexit
   45:  aload_0
   46:  getfield    #14; //Field x:I
   49:  ireturn
   50:  aload_1
   51:  monitorexit
   52:  athrow
  Exception table:
   from   to  target type
    14    45    50   any

public Test();
  Code:
   0:   aload_0
   1:   invokespecial   #26; //Method java/lang/Object."<init>":()V
   4:   return

}

As you can see, the normal val accessor is very short and will definitely be inlined, whereas the lazy val accessor is quite complex and (most importantly for concurrency) involves a synchronized block (the monitorenter/monitorexit instructions). You can also see the extra field that is generated by the compiler.

mikołak · Answer 2 · 2013-06-26T19:42:50.487

8

Firstly, we should be talking about lazy vals (Scala's "constants"), not lazy variables (which I don't think exist).

Two reasons would be maintainability and efficiency, especially in the context of class fields:

Efficiency: the benefit of non-lazy init is that you control where it happens. Picture a fork-join type framework where you generate a number of objects in worker threads, and then hand them of to central processing. With eager eval, the initialization is done on the worker threads. With lazy eval, this is done on the master thread, potentially creating a bottleneck.

Maintainability: if all your values are lazily initialized, and your program blows up, you get a stack trace that is localized in a completely different context than the initialization of your instance, potentially in another thread.

There are also, almost certainly, language-implementation-related costs (I see @Beryllium has posted one example), but I don't feel competent enough to discuss them.

edited Jun 26 '13 at 19:42

answered Jun 26 '13 at 19:13

mikołak

9,605
1
48
70

Wouldn't it make more sense to do generation as well as processing in the worker threads, _then_ return them to the master thread? Or, if object interaction is necessary, have the worker threads return their objects to another worker thread that will do the processing rather than the master thread so that the master thread doesn't have to worry about managing most of the worker threads, only the middle management (so to speak) thread. – JAB Jun 26 '13 at 19:16
That's precisely what I mean. To elaborate, a large number of algorithms have a distributed processing element **and** a centralized processing element - otherwise *Amdahl's law* wouldn't exist :). One trivial example is the merge step of Mergesort. Of course, it may sound obvious, but when you're potentially in an "all I have is a hammer" mindset, this is an actual pitfall you can fall into. – mikołak Jun 26 '13 at 19:23

score 1 · Answer 3 · answered Jun 27 '13 at 02:42

If I read your code and you used lazy I would burn time asking WHY YOU DID use lazy initialization which is probably the most expensive cost of lazy in addition to performance penalties.

Now where you SHOULD be thinking about lazy initialization (and similarly Streams which I'll include here) is:

Cyclical dependency: where one variable depends on another being initialized and/or visa versa. Infinite sets: Streams allow you to find the first 1000 prime numbers without needing to know how many real numbers that may represent going through.

I'm sure there are a couple others - these are the big ones that I can see.

Just remember a lazy val is like a def that is evaluated exactly once and know you should only use this when you actually need it otherwise it will confuse another developer when they ask why is it lazy?

Why shouldn't one make every Scala instance variable a lazily initialized one?

3 Answers3