Caching Scala Case Class Instances

Question

Suppose we have the following case classes:

abstract sealed class Tree
case class Leaf(i: Int) extends Tree
case class Node(left: Tree, right: Tree) extends Tree

Every time we call a case class constructor, a new object is created in memory. For instance, in the code below:

val a = Leaf(0)
val b = Leaf(0)

a and b point to distinct objects in memory:

a == b // true
a eq b // false

I would like to override the "apply" method of the case classes, to make them return a cached object, in case it already exists, so that, in the minimal example above, "a eq b" would return true.

I found these two related answers in Stackoverflow:

How to override apply in a case class companion (shows how to override "apply" method)
Why do each new instance of case classes evaluate lazy vals again in Scala? (shows a simple way to cache case class instances)

I am planning to implement my overriding "apply" method with caching in a way that combines the two approaches linked above. But I am wondering if there are alternative ways that I should consider. If you know any, could you please share your solution here?

Caching instances of case classes seems to be a very useful and natural thing to do to reduce memory consumption. And yet, the solution I am planning to implement (based on the two answers linked above) seems quite convoluted, requiring a lot of boilerplate code that will compromise the elegance and succinctness of case classes. Does anyone know if future versions of the Scala language might allow us to achieve case class instance caching by writing something simple like this:

abstract sealed class Tree
cached case class Leaf(i: Int) extends Tree
cached case class Node(left: Tree, right: Tree) extends Tree

??

Just a random question, have you seen this "caching" as a core feature in any other language ? At least I have not encountered it yet. — sarveshseri, Nov 14 '16 at 07:37
Me neither. And this makes me wonder if there are reasons for that. @Alexey Romanov's answer below points out some interesting reasons. — Bruno, Nov 14 '16 at 08:29

Alexey Romanov · Accepted Answer · 2016-11-14T07:54:01.543

2

Caching instances of case classes seems to be a very useful and natural thing to do to reduce memory consumption.

Note that this isn't even remotely an automatic improvement, and very much depends on usage pattern of the case class (not just yours, but anybody who uses your library):

You need to take into account the memory cache needs and inability to garbage collect instances referenced from the cache (note that using a WeakHashMap won't help: it requires "that value objects do not strongly refer to their own keys, either directly or indirectly").
If the keys are primitives (as in Leaf), they need to be boxed before lookup which will often already be a constructor call.
Lookup in a map is significantly slower than a trivial constructor call.
Escape analysis will often ensure the objects aren't actually constructed, while making sure your program works as if they were. Of course, caching will ensure that objects do escape.

But neglecting all that, you can write a macro annotation which will allow you @cached case class Leaf(i: Int) extends Tree and generate the code you want (or at least @cachedcase class; I am not sure if you'll be able to override apply otherwise). Because of the above I just wouldn't expect it to be a part of the language any time soon.

edited Nov 14 '16 at 07:54

answered Nov 14 '16 at 07:34

Alexey Romanov

167,066
35
309
487

These are good points in general, even though some of them do not apply to my particular usage pattern. Could you elaborate your point 4? How do I know when a new object will not be constructed? Where can I learn more about this? – Bruno Nov 14 '16 at 08:48
Could you provide a minimal example with the @cachedcase macro annotation? – Bruno Nov 14 '16 at 08:49
1

For basic explanation see http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html#escapeAnalysis. The idea is that if an object is only used during a method and doesn't "escape", it can be eliminated by replacing all fields by local variables. But checking whether any particular allocation is eliminated is far from trivial (and at any rate requires JIT to kick in, so trying it once won't work). See e.g. http://psy-lob-saw.blogspot.se/2014/12/the-escape-of-arraylistiterator.html. – Alexey Romanov Nov 14 '16 at 11:02
For a `@cachedcase` example, do you mean usage or implementation? Usage would be `@cachedcase class Leaf(i: Int) extends Tree`. Implementing would take time I don't currently have, especially to do it right. – Alexey Romanov Nov 14 '16 at 11:06
Thanks. I meant implementation, not usage. But don't worry. – Bruno Nov 15 '16 at 04:23

Caching Scala Case Class Instances

1 Answers1