6

If I have the following List:

List<String> list = Arrays.asList("hello", "world", "hello");

And I apply the following (Java8):

list.stream().distinct().collect(Collectors.toString());

Then I would get a list with "hello" and "world".

However, in my case, I have a list of a type (from an external api) where I want to "bypass" the equals Method, ideally with a comparator, as it doesn't cover what I need.

Assume this class looks like this:

public class Point {
    float x;
    float y;
    //getters and setters omitted
}

In this case, I would like two points that cover a certain criteria to be defined as equal, for instance (30, 20) and (30.0001, 19.999).

A custom comparator could do the trick, but I have found no API that does what the distinct() in Java8 Stream does, but with a comparator (or similar pattern).

Any thoughts? I know I could write such a function, but I would rather like the elegant way of using existing apis... I have no restriction with external libraries (guava, apache-commons, etc. are welcome if they have a comfortable way of doing what I need).

glts
  • 21,808
  • 12
  • 73
  • 94
Martin
  • 3,018
  • 1
  • 26
  • 45
  • Here is a similar discussion but trying to achieve that with a Set: http://stackoverflow.com/questions/14880450/java-hashset-with-a-custom-equality-criteria. I think the approach I am following goes in the right direction. Is there any API that can do such a thing? – Martin Jan 11 '16 at 17:23
  • from doc: distinct() Returns a stream consisting of the distinct elements (according to Object.equals(Object)) of this stream. You can override equals method in Point class for your particular case. In addition, distinct is a stateful pipeline operation. You can try to use filter instead: [example](http://stackoverflow.com/questions/23699371/java-8-distinct-by-property) – algor Jan 11 '16 at 17:25
  • A comparator (or an equals() method) that would implement the equality you're describing would break the contract of Comparator (resp. equals). Indeed, if two points are equal if they're close to each other (let's say by at most 0.1, in both directions), then it means that `(1, 1)` is equal to `(0.9, 0.9)`, and that `(0.9, 0.9)` is equal to `(0.8, 0.8)`. But then, by transitivity (which is mandated by the Comparator contract), `(1, 1)` must also be equal to `(0.8, 0.8)`, which is not what you want. – JB Nizet Jan 11 '16 at 17:36
  • Unless what you want is for both coordinates to be rounded to the closest integer. In that case, you could simply map all points to the closest 'integer' one before calling distinct(). – JB Nizet Jan 11 '16 at 17:38
  • @JBNizet, you have a really good point. with the contract issue. I really believe it wouldn't be an issue because I have like 2-3 duplicate points around each point, and so it is like groups of points that are close to each other. In this case, any of them is good enoough for my purpose. Maybe I will have to look for another algorithmic approach... I just want to keep it as simple as possible... – Martin Jan 11 '16 at 17:45
  • The trick is to use `filter` to produce a `distinct` stream. See http://stackoverflow.com/questions/27870136/java-lambda-stream-distinct-on-arbitrary-key/27872086#27872086 – Leet-Falcon Jan 11 '16 at 18:35

1 Answers1

4

HashingStrategy is the concept you're looking for. It's a strategy interface that allows you to define custom implementations of equals and hashcode.

public interface HashingStrategy<E>
{
    int computeHashCode(E object);
    boolean equals(E object1, E object2);
}

Streams don't support hashing strategies but Eclipse Collections does. It has sets and maps that support hashing strategies as well as overloads of methods like distinct() that take hashing strategies.

This would work well for Strings. For example, here's how we could get all distinct Strings ignoring case.

MutableList<String> strings = Lists.mutable.with("Hello", "world", "HELLO", "World");
assertThat(
    strings.distinct(HashingStrategies.fromFunction(String::toLowerCase)),
    is(equalTo(Lists.immutable.with("Hello", "world"))));

Or you can write the hashing strategy by hand to avoid garbage creation.

HashingStrategy<String> caseInsensitive = new HashingStrategy<String>()
{
    @Override
    public int computeHashCode(String string)
    {
        int hashCode = 0;
        for (int i = 0; i < string.length(); i++)
        {
            hashCode = 31 * hashCode + Character.toLowerCase(string.charAt(i));
        }
        return hashCode;
    }

    @Override
    public boolean equals(String string1, String string2)
    {
        return string1.equalsIgnoreCase(string2);
    }
};

assertThat(
    strings.distinct(caseInsensitive),
    is(equalTo(Lists.immutable.with("Hello", "world"))));

This could work for Points too, but only if you can group all points within non-overlapping regions to have the same hashcode. If you're using a Comparator defined to return 0 when two Points are close enough, then you can run into transitivity problems. For example, Points A, B, and C can fall along a line with A and C both close to B but far from each other. Still, if this is a useful concept to you, we'd welcome a pull request adding ListIterable.distinct(Comparator) to the API.

Note: I am a committer for Eclipse Collections.

Craig P. Motlin
  • 26,452
  • 17
  • 99
  • 126
  • 2
    I will give this a try... I can indeed give all points the same hash code, as I am now able to identify them. I did the trick with a nested if running through the collection and applying my custom comparator and deleting all elements that are detected the same... Computing the hash code should be possible, I think it is only a matter of rounding... – Martin Jan 11 '16 at 22:20