My problem is much like the previous post Optimal HashSet Initialization (Scala | Java), where I want to use HashSet
for speedups (currently i am using Set
) but the HashSet
does not exhibit its (Constant time)advantages.
For the solution mentioned:
You can minimize the cost of equals by interning. This means that you acquire new objects of the class through a factory method, which checks whether the requested new object already exists, and if so, returns a reference to the existing object. If you assert that every object of this type is constructed in this way you know that there is only one instance of each distinct object and equals becomes equivalent to object identity, which is a cheap reference comparison (eq in Scala).
However, I am not quite sure what's the efficient way to check
whether the requested new object already exists
for large objects (e.g. objects of case class with parameter of hashmap, some other object structures...etc)
By comparing each of those complicated fields do not give out much performance advantage, isn't it? Or if it is, are there other ways?
In addition, I'm also confused that how to make
equals becomes equivalent to object identity, which is a cheap reference comparison (eq in Scala).
in code.
The intening technique mentioned above, I think, is basically an object cache. Therefore, I reference to the technique mentioned in the post Caching strategy for small immutable objects in Java?. However, I still do not see what's the efficient way for large objects.
For convenience, I quoted the caching technique (in Java) from the post with ///
denoting my thoughts and questions:
private static final int N_POINTS = 10191;
private static final Point[] POINTS = new Point[N_POINTS];
public static Point of(int x, int y, int z) {
int h = hash(x,y,z); /// I can use hash code of each complicated field to construct the value
int index = (h & 0x7fffffff) % N_POINTS;
Point p = POINTS[index];
if (p != null && p.x == x && p.y == y && p.z == z) /// Not working for large objects?
return p;
return POINTS[index] = new Point(x,y,z);
}
To summarize, what's the best practice to implement efficient caching strategy for large objects, so that I can take advantage of HashSet
in Scala?
Thanks,