Consider this interface:
public interface CoordinateSet {
boolean contains(int x, int y);
default boolean contains(Coordinate coord) {
return contains(coord.x, coord.y);
}
}
It represents a set of 2-dimensional integer coordinates, and each possible coordinate may be either inside the set (contains
returns true
) or outside (contains
returns false
).
There are many ways we can implement such an interface. The most computationally efficient one would be the implementation backed up by an array:
public class ArrayCoordinateSet implements CoordinateSet {
private final boolean[][] coords = new boolean[SIZE][SIZE];
// ...
@Override
public boolean contains(int x, int y) {
return coords[x][y];
}
public void add(int x, int y) {
coords[x][y] = true;
}
// ...
}
However, if SIZE
is something large, say, 1000, and there are only, say, 4 cordinates that belong to the set, right in the four angles of a 1000×10000 rectangle, that means the absolute majority of cells
space is consumed by false
values. For such a sparse CoordinateSet we'd better be using a HashSet
-based CoordinateSet
:
public final class Coordinate {
public final int x;
public final int y;
public Coordinate(int x, int y) {
this.x = x;
this.y = y;
}
// .equals() and hashCode()
}
public class HashBasedCoordinateSet implements CoordinateSet {
private final Set<Coordinate> coords = new HashSet<>();
@Override
public boolean contains(int x, int y) {
return coords.contains(new Coordinate(x, y));
}
@Override
public boolean contains(Coordinate coord) {
return coords.contains(coord);
}
public void add(Coordinate coord) {
coords.add(coord);
}
}
However, with the HashBasedCoordinateSet
we have such an issue:
for (int x=0; x<1000; x++) {
for (int y=0; y<1000; y++) {
hashBasedCoordinateSet.contains(x, y);
}
}
When we have values x
and y
and want to check if hashBasedCoordinateSet.contains(x, y)
, then that would require creating a new object at each method call (since we always need an object to search in a HashSet
, it is not enough to just have object's data). And that would be a real waste of CPU time (it'd need to create all those Coordinate
objects and then grabage-collect them, since seemngly no escape-analysis optimisation can be performed on this code).
So finally, my question is:
What would be the data structure to store a sparse set of coordinates that:
- Has O(1)
contains(int x, int y)
operation; - Efficiently uses space (unlike the array-based implementation );
- Does not have to create extra objects during
contains(int x, int y)
?