Scala: lazy vals, call by name, closures and memory leaks

Question

I have a scala procedure creating a large data structure using an even larger index in the process. Because I want to do it in one pass and not get boggled down in complicated precedence resolution, I'm using lazy vals in the result initialized with expressions which might not evaluate to a correct value (or any at all) at the moment of creating the component, but will do so once the whole build process is completed. This means that each component of the final result has a synthetic reference to a closure with my whole index, and potentially, as long as any of them is still in memory, my index cannot be garbage collected. Obviously, I don't want it - ideally I'd like to be able to make a second pass over the structure to initialize the values if needed (and to ensure that any errors are caught at this point), and let the index be garbage collected. Currently I pass the the initialization expression by name through several functions and use it in a lazy val declaration, equivalent to this:

class Component(init : =>Component) {
   lazy val property = init
}
...
new Component(index.get(parameters))

Is this sound? Will the synthetic init field by dereferenced once lazy val is accessed? What if I want to use it in an initialization function, like this:

class Component(init : =>Component) {
   private def evaluate = init
   lazy val property = evaluate
}

Are there any rules to keep in mind in general when programming with closures?

Could you go slightly heavier on the real code? if `index.get` already is a function that returns a `Component`, what is the wrapping good for? Also, which types are mutable (some must be, otherwise there is no initialization order problem) — Martijn, May 26 '15 at 15:51
Real code is rather huge, but the underlying question is very basic: is the synthetic closure garbage collected after a lazy val initialization if it's used only in that initializer (special case), and are there any rules governing it (generic one). In my case the only mutable structure is the index used during the building process, all other forward/cyclic references are resolved by lazy vals initialized with expressions returning correct values from the index. The closure exists because the initializing expression may be complex and has arguments about which compenents shouldn't care — Turin, May 27 '15 at 00:47
It doesn't have to be all of the real code - or even any of the real code, just enough to be a self-contained example that compiles, and shows the problematic behaviour — Martijn, May 27 '15 at 08:35

score 4 · Answer 1 · answered May 26 '15 at 17:03

The main problem you're describing--that the index can't be garbage collected--is solved by putting the index into a mutable box that you empty (null out) once the object is created.

However, if you don't know when your object is created, and need the program to tell you (e.g. by knowing that all lazy vals have been populated), then you're out of luck. Barring poking around in memory with sun.misc.Unsafe, you're not supposed to know those sorts of details. (That's kind of the point of lazy vals.)

You could cook up a reference-counting scheme that would help you somewhat at detecting yourself when you can clear the box: increment a counter on the box when you enter the constructor, keep a private field count of how many lazy vals you have, and decrement that count (atomically!) every time you initialize a lazy val and if you hit zero, decrement the counter on the box and null the box if the box counter has reached zero.

If I needed to go to that extent to force dereferencing the index, I could just use one var for the final 'lazy' value and another var to store an explict init function to be set to null after it has been evaluated, but it's too much of a boilerplate, and ai hoped that it is exactly the code the compiler produces in this example. — Turin, May 27 '15 at 00:51

Scala: lazy vals, call by name, closures and memory leaks

1 Answers1