1

I have a set of immutable objects that implement equals() that strictly compare the member variables they hold. Two objects are identical if the member variables are equal. hashCodes() are consistent with this.

I want to be able to aggregate these objects according to looser definitions of similarity. I was thinking of using comparator for this so that I could define a bunch of ad hoc similarity rules, but the javadoc states that comparator should be consistent with equals(). Is is OK to break the comparator contract with equals() to achieve this, or is there some better method/pattern/strategy for aggregating objects according to some similarity rules?

Examples may include:

  • Locations: equals() returns true if LatLng and place name exactly equal, but comparator returns 0 if LatLng within say 25m/50m/100m regardless of place name etc, or if only place names are equal regardless of LatLng.
  • Dates: equals() returns true if long millis are equal, but comparator returns 0 if on same day/month/year etc..
  • Strings: equals() return true if equalsIgnoreCase() is true, but comparator may removes spaces/special characters to reduce to some canonical form then run equals etc.
smci
  • 32,567
  • 20
  • 113
  • 146
chdryra
  • 523
  • 5
  • 17
  • Related: (to justify why the 'inconsistent' part means this mustn't be done by overriding `equals()`) [What issues should be considered when overriding equals and hashCode in Java?](https://stackoverflow.com/questions/27581/what-issues-should-be-considered-when-overriding-equals-and-hashcode-in-java) – smci May 08 '21 at 00:18

2 Answers2

0

Considering that "similarity" is not a transitive operation: if a is similar to b, and b is similar to c, then a may not be similar to c, I would suggest not to use Comparable/Comparator here, because its contract implies transitivity.

A custom interface that suits your needs should be a good option:

interface SimilarityComparable<T, D> {
    // D - object that represents similarity level
    D getSimilarityLevel(T other);
}

However with this approach you won't be able to define multiple groups of similarity for the same type, because of Java generics:

class Location implements SimilarityComparable<Location, Distance>, SimilarityComparable<Location, NameDifference> {
    // won't compile - can't use two generic interfaces of the same type simultaneously
}

In this case you can fall-back to comparators:

interface SimilarityComparator<T, D> {
    D getSimilarityLevel(T a, T b);
}

class LocationDistanceSimilarityComparator implements SimilarityComparator<Location, Distance> { 
    ... 
}

class LocationNameSimilarityComparator implements SimilarityComparator<Location, NameDifference> { 
    ... 
}
user3707125
  • 3,394
  • 14
  • 23
  • Great thanks! Looks nice and clean. I guess if defined correctly I could use D as something that is Comparable to another D? Therefore use a canonical D_0 as the 'coarseness' of my similarity? e.g. if D is the Distance similarity measure in the example above, I could have some logic that states that if D_i is less than some predefined D_0, then the locations that produced D_i should be considered the same for aggregation purposes? – chdryra Jul 01 '15 at 13:53
  • @chdryra, yes - something like that, or you can remove that second parameter and simply use `boolean` - depends on your case. – user3707125 Jul 01 '15 at 14:00
  • I don't see why similarity (as implemented by most methods) is not transitive, can you give an example where if *a* is similar to *b*, and *b* is similar to *c*, but *a* is not similar to *c*? Other than say LatLng(a,b) is < 25m and so is LatLng(b,c), but LatLng(b,c) == 40m, thus failing similarity. I think similarity is only not transitive when we start involving distances or thresholds. – smci May 07 '21 at 23:39
  • @smci, I believe one can illustrate that using geometrical representation. Equality means that two points are in the same spot. Similarity means that two points are close within a certain threshold. Also similarity is bidirectional. Which means that geometrically it can be represented as a distance. Therefore I would rather ask what concept of similarity doesn't resemble distance. – user3707125 May 28 '21 at 11:57
0

The answer will depend on how you intend to use the comparison.

However, I agree with @user3707125, do not under any circumstances implement Comparable for this purpose. You have no control over what code will use the Comparable functions for comparisons that must follow the rules defined.

There are circumstances where you must use Comparator, such as if you are using the standard Java API library to sort or filter your objects. If this is the case then you are still quite safe using a Comparator because you have control over where the Comparator is used, and where the more rigorous equals() is used. You just need to be aware that if your relaxed comparisons don't meet the standard contract then sort and filter operations will produce results that are also similarly relaxed and inconsistent.

If you don't have to use Comparator then don't. You're free to define your own methods and interfaces to do whatever you want in whatever way you like. Knock yourself out!

Terrible Tadpole
  • 607
  • 6
  • 20
  • My aim is to be able to aggregate data into buckets of similarity to help organise some documents. So if documents A and B contain locations L_a and L_b, then if L_a and L_b are considered the same under some rule (say equivalent to a canonical L_0), the documents can get filed under L_0. Looks like defining my own interface is the way to go as you point out. Thanks! – chdryra Jul 01 '15 at 13:57