I would like to de-duplicate a list of Floats
while specifying an appropriate epsilon for precision.
Here is the solution that I started writing before I realized that there are so many pitfalls. For example, this solution has problems with binning e.g. if I set epsilon
to 1.0
and my input is {0.9, 1.0, 2.0, 3.0}
, I get {0.9, 2.0}
. However, if my input is {1.0, 2.0, 3.0}
, I get {1.0, 3.0}
.
Another issue is that it is unclear what is the best way to handle values like NaN, infinity, -0.0f, etc. so that this function works in many general use cases (perhaps there should be an optional parameter that customizes this behavior?).
I am sure there are other corner cases as well.
// Suffers from binning issues
public static List<Float> dedup(List<Float> floats, float epsilon) {
List<Float> sortedFloats = new ArrayList<Float>();
sortedFloats.addAll(floats);
Collections.sort(sortedFloats);
List<Float> dedupedList = new ArrayList<Float>();
for (Float f : sortedFloats) {
if (dedupedList.size() == 0) {
dedupedList.add(f);
} else {
Float previousValue = dedupedList.get(dedupedList.size() - 1);
if (Math.abs(previousValue - f) >= epsilon) {
dedupedList.add(f);
}
}
}
return dedupedList;
}