Caching strategy for small immutable objects in Java?

Question

I am developing an app that creates a large number of small, immutable Java objects. An example might be:

public class Point {
  final int x;
  final int y;
  final int z;
  .....
}

Where it is likely that many instances of Point will need to refer to the same (x,y,z) location.

To what extent does it make sense to try to cache and re-use such objects during the lifetime of the application? Any special tricks to handle this kind of situation?

Jeremy · Answer 1 · 2011-04-25T23:18:14.387

11

When it becomes a problem. Otherwise you're just creating a useless layer of abstraction.

Either way, you could easily implement this with a PointFactory that you call to get a Point, which always returns the same object instance for any given x, y and z. But then you have to manage when the points should be removed from cache because they wont be garbage collected.

I say forget about it unless it's an actual issue. Your application shouldn't depend on such a caching mechanism, which would allow you to add it in later if necessary. So maybe just use a factory that returns a new point instance very time for now.

public class PointFactory{
    public static Point get(int x, int y, int z){
        return new Point(x, y, z);
    }
}

edited Apr 25 '11 at 23:18

answered Apr 25 '11 at 22:15

Jeremy

22,188
4
68
81

3

This does not prevent from new point instances being created each time. Ideally, we would like a Map>, and maintain it -- where Coordinate could be some unique identifier of x, y - even x + " " + y would suffice here. – kuriouscoder Apr 25 '11 at 22:23
5

@kuriouscoder: I didn't attempt to describe how to implement a cache. My point (heh) is that a cache shouldn't exist unless there's a memory issue or something. My suggestion was to use a factory to get a point instance *in case* a cache becomes necessary. – Jeremy Apr 25 '11 at 22:28

score 8 · Accepted Answer · answered Apr 26 '11 at 08:30

The problem you are likely to have is making the object pool light weight enough to be cheaper than just creating the objects. You want to the pool to be large enough that you get a fairly high hit rate.

In my experience, you are likely to have problems micro-benchmarking this. When you are creating a single object type repeatedly in a micro-benchmark, you get much better results than when creating a variety of objects in a real/complex application.

The problem with many object pool aproaches is that they a) require a key object, which costs as much or more than creating a simple object, b) involve some synchromization/locking which again can cost as much as creating an object c) require an extra object when adding to the cache (e.g. a Map.Entry), meaning your hit rate has to be much better for the cache to be worth while.

The most light weight, but dumb caching strategy I know is to use an array with a hashcode.

e.g.

private static final int N_POINTS = 10191; // or some large prime.
private static final Point[] POINTS = new Point[N_POINTS];

public static Point of(int x, int y, int z) {
    int h = hash(x,y,z); // a simple hash function of x,y,z
    int index = (h & 0x7fffffff) % N_POINTS;
    Point p = POINTS[index];
    if (p != null && p.x == x && p.y == y && p.z == z)
       return p;
    return POINTS[index] = new Point(x,y,z);
}

Note: the array is not thread safe, but since the Point is immutable, this doesn't matter. The cache works on a best effort basis, and is naturally limited in size with a very simple eviction strategy.

For testing purposes, you can add hit/miss counters to determine the caches effectiveness for you data set.

nice - I also like the fact that the array bounds the total amount of cached data so that memory leaks are not a problem! — mikera, Apr 26 '11 at 14:00
@mikera, it is naturally bounded with a minimum of resource overhead. ;) — Peter Lawrey, Apr 26 '11 at 15:28
Cool, thanks! I accepted this answer because it provides a neat and lightweight practical solution - although I also appreciate the point that it's tricky to benchmark these things so may not be the optimal solution in all circumstances. — mikera, May 13 '11 at 10:06

score 4 · Answer 3 · answered Apr 25 '11 at 22:18

4

It sounds almost like a textbook example of the Flyweight pattern.

answered Apr 25 '11 at 22:18

regularfry

3,248
2
21
27

I don't understand -- doesn't the pattern only apply when their is memory savings by sharing part of the internals of the outward facing object? In this case, the shared data is comprised of 3 ints -- 12 bytes, which is smaller than the true cost of a shared object underlying each Point... – Dilum Ranatunga Apr 26 '11 at 03:23
There's a limited choice here. Either the shared data stays where it is, with each instance maintaining a copy of its own, or the shared data gets factored out into its own class (which is where the Flyweight pattern could be used) and each Point instance just maintains a reference to its current coordinates. The memory savings there are going to be slight - 12 bytes against 8 (or 4, depending on the JVM) plus the number of distinct coordinates - but depending on how the coordinates are going to be used and compared, there could be major algorithmic savings down the line. – regularfry Apr 26 '11 at 09:28

score 3 · Answer 4 · answered Apr 25 '11 at 22:19

3

How many instances will share the same coordinates, how many will exist at the same time, and how many will be discarded?

Reusing the objects only has benefits if a significant percentage of live objects at one time are duplicates (at least 20%, I'd say) and overall memory usage is problematic. And if objects are discarded frequently, you have to construct the cache in a way that prevents it from becoming a memory leak (probably using soft/weak references).

answered Apr 25 '11 at 22:19

Michael Borgwardt

342,105
78
482
720

Reusing the objects may also be helpful if they will be frequently compared against each other, since two references to the same object may be recognized as equal more quickly than would be two references to different objects that have identical content. – supercat Nov 12 '13 at 20:33

score 1 · Answer 5 · answered Apr 25 '11 at 22:26

1

Remember that caching these objects will influence concurrency and garbage collection in (most likely) a bad way. I wouldn't do it unless the other objects that refer to the points are long lived too.

answered Apr 25 '11 at 22:26

Dilum Ranatunga

13,254
3
41
52

The objects are immutable, so concurrency is probably not an issue. GC definitely is. – Jeremy Apr 25 '11 at 22:31
2

The concurrency problem is not with the objects themselves. The contention is with the cache. You can minimize it with a ConcurrentHashMap etc, but there is always concurrency overhead with a cache if it is shared across threads. – Dilum Ranatunga Apr 26 '11 at 03:17

score -1 · Answer 6 · answered Apr 25 '11 at 22:23

-1

As for most cases: it depends.

If your object is rather complex (takes a lot of time to instatiate) put can be expressed in a string, it makes sense to create and load them through a static factory method.

This also makes sense if some representations of the object are used more often than others (in your case maybe Point(0,0,0))

e.g

private static final HashMap<String, Point> hash = new HashMap<String, Point>();

public static Point createPoint(int x, int y, int z) {
 String key = getKey(x,y,z);
 Point created = hash.get(key)
 if (created == null) {
  created = new Point(x,y,z);
  hash.put(key,created);
 }
 return created;
}

private static String createKey(int x, int y, int z) {
 StringBuffer buffer = new StringBuffer();
 buffer.append("x:");
 buffer.append(x);
 buffer.append("y:");
 buffer.append(y);
 buffer.append("z:");
 buffer.append(z);
 return buffer.toString()
}

answered Apr 25 '11 at 22:23

leifg

8,668
13
53
79

2

If you are having a large number of these, using a String as the key is a very poor choice. The string itself will cost a whole lot more to create and store than the points themselves. – Dilum Ranatunga Apr 25 '11 at 22:26
I do not agree entirely - But for sake of arguing, how about Char[String.value(x).length() + 1 + String.value(y).length()] - i.e., x + ' ' + y? – kuriouscoder Apr 25 '11 at 22:30
In this case, the key probably ought to be the Point itself. The immutable point's construction will be significantly more performant than string building. – Dilum Ranatunga Apr 25 '11 at 22:43
As a sidenote, use StringBuilder instead of StringBuffer if you insist on this approach. – Dilum Ranatunga Apr 25 '11 at 22:43

Caching strategy for small immutable objects in Java?

6 Answers6

Linked