5

I wish to create a HashSet for real numbers (at present Doubles) using a defined tolerance (epsilon), (cf Assert.assertEquals(double, double, double)
Since using Double.equals() only works for exact equality and Double is a final class I can't use it. My initial idea is to extend HashSet (e.g. to DoubleHashSet), with a setEpsilon(double) method and create a new class ComparableDouble where equals() uses this value from DoubleHashSet. However I'd like to check whether there are existing solutions already and existing F/OSS libraries.

(In the future I shall want to extend this to tuples of real numbers - e.g. rectangles and cubes - so a generic approach is preferable

NOTE: @NPE has suggested it's impossible. Unfortunately I suspect this is formally correct :-) So I'm wondering if there are approximate methods ... Others must have had this problem and solved it approximately. (I already regularly use a tool Real.isEqual(a, b, epsilon) and it's very useful.) I am prepared to accept some infrequent errors of transitivity.

NOTE: I shall use a TreeSet as that solves the problem of "nearly equals()". Later I shall be comparing complexNumbers, rectangles (and more complex objects) and it's really useful to be able to set a limit within which 2 things are equal. There is no simple natural ordering of complexNumbers (perhaps a Cantor approach would work), but we can tell whether they are nearly equal.

Community
  • 1
  • 1
peter.murray.rust
  • 37,407
  • 44
  • 153
  • 217

3 Answers3

5

There are some fundamental flaws in this approach.

HashSet uses equals() to check two elements for equality. The contract on equals() has the following among its requirements:

It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.

Now consider the following example:

x = 0.0
y = 0.9 * epsilon
z = 1.8 * epsilon

It is clear that your proposed comparison scheme would break the transitivity requirement (x equals y and y equals z, yet x doesn't equal z). In these circumstances, HashSet cannot function correctly.

Furthermore, hashCode() will produce additional challenges, due to the following requirement:

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

The hashCode() requirement can be sidestepped by using a TreeSet instead of HashSet.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • I have a sinking feeling you are right. But I am prepared to have a small degree of failure in this. For example I can create the nearest integer to 10*(-log(epsilon)) as the hashCode(). If it occasionally fails it won't be the end of the world. – peter.murray.rust Apr 14 '13 at 09:54
  • The `hashCode()` problem can be circumvented by using a `TreeSet` instead, but the transitivity requirement stays the same. – Philipp Wendler Apr 14 '13 at 09:55
  • @Philipp Sounds promising. Can you give me an example? – peter.murray.rust Apr 14 '13 at 09:58
  • @peter.murray.rust: You could use `TreeSet` with epsilon-based comparisons, and then just accept that it'll behave *strangely* (i.e. the elements that it'll contain will depend on the *order* in which the elements are added to the set). – NPE Apr 14 '13 at 09:59
  • @NPE I'd be very happy for that. I know my data and the effective tolerances and there are worse fuzzinesses in the overall problem! Presumably I can add a comparator to the TreeSet. – peter.murray.rust Apr 14 '13 at 10:06
  • I have accepted this as the answer because it works for my problem and is simple. I have added my own answer to show my implementation – peter.murray.rust Apr 14 '13 at 12:28
2

What I would do is round the doubles before using them (assuming this is appropriate)

e.g.

public static double roundByFactor(double d, long factor) {
    return (double) Math.round(d * factor) / factor;
}

TDoubleHashSet set = new TDoubleHashSet(); // more efficient than HashSet<Double>
set.add(roundByFactor(1.001, 100));
set.add(roundByFactor(1.005, 100));
set.add(roundByFactor(1.01, 100));
// set has two elements.

You can wrap this behaviour in your own DoubleHashSet. If you want to reserve the original value you can use HashMap or TDoubleDoubleHashMap where the key is the rounded value and the value is the original.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Many thanks. Do I assume this is `gnu.trove.set.hash.TDoubleHashSet`? If so it looks (at first glance) what I have been looking for as long as it doesn't have a GPL licence. [see http://trove4j.sourceforge.net/javadocs/gnu/trove/set/hash/TDoubleHashSet.html ] – peter.murray.rust Apr 14 '13 at 10:52
  • 1
    It's LGPL which is fine for my community and most distribution strategies – peter.murray.rust Apr 14 '13 at 10:59
0

I have implemented @NPE's approach (I have accepted his/her answer so s/he gets the points :-) and give the code here

//Create a comparator:
public class RealComparator implements Comparator<Double> {

    private double epsilon = 0.0d;

    public RealComparator(double eps) {
        this.setEpsilon(eps);
    }

    /**
     * if Math.abs(d0-d1) <= epsilon  
     * return -1 if either arg is null
     */
    public int compare(Double d0, Double d1) {
        if (d0 == null || d1 == null) {
            return -1;
        }
        double delta = Math.abs(d0 - d1);
        if (delta <= epsilon) {
            return 0;
        }
        return (d0 < d1) ? -1 : 1;
    }

    /** set the tolerance
     * negative values are converted to positive
     * @param epsilon
     */
    public void setEpsilon(double epsilon) {
        this.epsilon = Math.abs(epsilon);
    }

and test it

public final static Double ONE = 1.0;
public final static Double THREE = 3.0;

@Test
public void testTreeSet(){
    RealComparator comparator = new RealComparator(0.0);
    Set<Double> set = new TreeSet<Double>(comparator);
    set.add(ONE);
    set.add(ONE);
    set.add(THREE);
    Assert.assertEquals(2, set.size());
}
@Test
public void testTreeSet1(){
    RealComparator comparator = new RealComparator(0.0);
    Set<Double> set = new TreeSet<Double>(comparator);
    set.add(ONE);
    set.add(ONE-0.001);
    set.add(THREE);
    Assert.assertEquals(3, set.size());
}
@Test
public void testTreeSet2(){
    RealComparator comparator = new RealComparator(0.01);
    Set<Double> set = new TreeSet<Double>(comparator);
    set.add(ONE);
    set.add(ONE - 0.001);
    set.add(THREE);
    Assert.assertEquals(2, set.size());
}
@Test
public void testTreeSet3(){
    RealComparator comparator = new RealComparator(0.01);
    Set<Double> set = new TreeSet<Double>(comparator);
    set.add(ONE - 0.001);
    set.add(ONE);
    set.add(THREE);
    Assert.assertEquals(2, set.size());
}
peter.murray.rust
  • 37,407
  • 44
  • 153
  • 217
  • NOTE: I have implemented this for tuples of numbers. Since there is no natural ordering it "behaves strangely" - it may create several subsets of the intended set. – peter.murray.rust Apr 16 '13 at 08:27