3

I have a list(!) of items:

  • A
  • B
  • C
  • D
  • E
  • ...

and I want to group them:

  • [A, C, D]
  • [B, E]
  • ...

Groups are defined by:

  • all items in the group are equal according to a custom function f(a, b) -> boolean
  • f(a, b) = f(b, a)

Question: is there ready API to do so?

<T> List<List<T>> group(Collection<T> collection, BiFunction<T, T, Boolean> eqF);

UPDATE. This question is totally not for a scenario when you can define some quality to group by! In this case Java 8 Collectors.groupingBy is the simplest answer.

I am working with multidimensional vectors and equality function looks like:

  • metrics(a, b) < threshold

For this case defining a hash is equal to solving the initial task :)

Denis Kulagin
  • 8,472
  • 17
  • 60
  • 129
  • How can you have more than 2 groups if the custom function used to group the elements returns a boolean? – Tunaki Sep 04 '16 at 19:17
  • 3
    @Tunaki - That's called partitioning into equivalence classes. Suppose the objects are integers and equality (true/false) is computed modulo 3 (i.e., they are equal if they have the same remainder). Then the integers from 1 to 100 will end up in three buckets, even though it's a binary equality test. – Ted Hopp Sep 04 '16 at 19:18
  • E.g. there is an *F* element and it isn't equal to any other. It makes a group of one element. – Denis Kulagin Sep 04 '16 at 19:18
  • I think the streams API in Java 8 could do this pretty easily. Do you have access to streams or are you on an earlier version? – markspace Sep 04 '16 at 19:36
  • @markspace Got Java 8 and streams at my disposal. – Denis Kulagin Sep 04 '16 at 19:38
  • It would be interesting to see what this function actually tests for. – Jorn Vernee Sep 04 '16 at 19:38
  • `groupingBy()` certainly does solve the question you actually asked. If your question is different, you should close this one out and ask a new question with a better description. Why can't you use `groupingBy()` on a matrix? – markspace Sep 04 '16 at 20:27
  • 3
    Responding to your update: are you sure that you have an equivalence relation here? Usually, `metrics(a,b) < threshold` is not transitive unless your space has a structure that satisfies certain unusual constraints. – Ted Hopp Sep 04 '16 at 20:33
  • @TedHopp My fault calling it *equivalence something*. It's not transitive and not even symmetrical in my particular task. – Denis Kulagin Sep 04 '16 at 20:40
  • 1
    If the test isn't even symmetric, then you don't have a metric. How is grouping items (1) relevant to what you're really trying to accomplish and (2) even possible theoretically? It sounds like you are asking for a solution to a problem Y that isn't actually relevant to your underlying problem. You might want to delete this question and post a different one that describes what you're actually after. – Ted Hopp Sep 04 '16 at 20:53

5 Answers5

3

Your scenario sounds like a good use case for the groupingBy collector. Normally, instead of supplying an equality function, you supply a function that extracts a qualifier. The elements are then mapped to these qualifiers in lists.

i.e.

Map<Qualifier, List<T>> map = list.stream()
    .collect(Collectors.groupingBy(T::getQualifier));

Collection<List<T>> result = map.values();

In the case the identity of T is your qualifier, you could use Function.identity() as an argument.

But this becomes a problem when your qualifier is more than 1 field of T. You could use a tuple type, to create an alternate identity for T but this only goes so far, as there needs to be a separate tuple class for each number of fields.


If you want to use groupingBy you really need to create a temperate alternate identity for T, so you don't have to change T's equals and hashCode methods.

To create a proper identity, you need to implement equals and hashCode (or always return 0 for a hash code, with performance downsides). There is no API class for this, that I know of, but I have made a simple implementation:

interface AlternateIdentity<T> {    
    public static <T> Function<T, AlternateIdentity<T>> mapper(
            BiPredicate<? super T, Object> equality, ToIntFunction<? super T> hasher) {
        return t -> new AlternateIdentity<T>() {
            @Override
            public boolean equals(Object other) {
                return equality.test(t, other);
            }

            @Override
            public int hashCode() {
                return hasher.applyAsInt(t);
            }
        };
    }
}

Which you could use like:

Collection<List<T>> result
    = list.stream()
        .collect(Collectors.groupingBy(
            AlternateIdentity.mapper(eqF, hashF)
        ))
        .values();

Where eqF is your function, and hashF is a hash code function that hashes the same fields as eqF tests. (Again, you could also just return 0 in hashF, but having a proper implementation would speed things up.)

Jorn Vernee
  • 31,735
  • 4
  • 76
  • 93
1

You can use hashing to do this in linear time.

To do this, you need to first implement the hashCode() function in your object, so it returns an equal hash value for equal elements (for example by XOR-ing the hash codes of its instance properties). Then you can use a hash table of sets to group your elements.

Map<Integer, Set<T>> hashMap = new HashMap<>();
for (T element : collection) {
    if (!hashMap.containsKey(element.hashCode())
         hashMap.put(element.hashCode(), new HashSet<T>());
    hashMap.get(element.hashCode()).add(element);
}

As equal elements produce the same hash, they will be inserted into the same equivalence class.

Now, you can obtain a collection of all equivalence classes (as sets) by using hashMap.values();

Palle
  • 11,511
  • 2
  • 40
  • 61
  • 1
    Writing an appropriate hash code isn't going to be easy. (Your suggestion wouldn't work, as items with different fields need to come out with the same hash code if the custom equality function says so.) Also, this isn't going to work if OP needs to process the same item list with different equality tests. – Ted Hopp Sep 04 '16 at 20:04
  • Forcing to rely on a specific `hashCode` function is bad, really bad. The equals/hashCode should be externalized. – Olivier Grégoire Sep 04 '16 at 20:24
1

I'm pretty sure there's nothing in the standard API for this. You might try a third-party collection class, like Trove's TCustomHashSet. (It's interesting that, according to a comment in this related thread, the Guava group has (for now) rejected a similar class. See the discussion here.)

The alternative is to roll your own solution. If you don't have too many items, I'd suggest a brute-force approach: keep a list of item lists and, for each new item, go through the list of lists and see if it is equal to the first element of the list. If so, add the new item to the matching list and, if not, add a new list to the list of lists with that item as the only member. The computation complexity is not very good, which is why I would only recommend this where the number of items is small or execution time performance is not an issue at all.

A second approach is to modify your item class to implement the custom equality function. But to use that with the hash-based collection classes, you'll need to override hashcode() as well. (If you don't use a hash-based collection, you might as well go with the brute force approach.) If you don't want to (or can't) modify the item class (e.g., you want to use various equality tests), I'd suggest creating a wrapper class that can be parameterized with the equality (and hash code) strategy to use. (This is kind of half way between modifying your item class and using the Trove class.)

Community
  • 1
  • 1
Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
  • I think you should go deeper in Guava: it has the `Equivalence` class/interface and can probably really help in this scenario. Just because Guava didn't accept it (so far), their API can't be used to fulfil OP's use case. – Olivier Grégoire Sep 04 '16 at 20:27
  • @OlivierGrégoire - Yeah, I don't know Guava all that well. You should write up your idea as an answer. But based on OP's latest edit to the question, I don't think any of these approaches is going to work. (See my second comment to the question.) I suspect this is an [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). – Ted Hopp Sep 04 '16 at 20:35
  • Yes, it smells the XY problem, but I think the solution can work well. Plus, the Guava solution is basically Jon's answer. I was writing what he wrote and posted right after him and deleted my answer given his was better. – Olivier Grégoire Sep 04 '16 at 20:41
  • @OlivierGrégoire - From what OP has just described, I don't think any of these solutions are going to work, neither mine nor Jom's. Consider a one-dimensional case with the test that two numbers are "the same" if they are within 0.5 of one another. (OP says the function isn't even symmetric, which means this is already an oversimplification.) Then 0.0 and 0.3 belong in the same bucket; likewise 0.3 and 0.6. But 0.0 and 0.6 belong in different buckets. So either 0.3 goes in two buckets (and we aren't partitioning) or the bucket constraint is violated. – Ted Hopp Sep 04 '16 at 20:49
1

Here's a simple example grouping strings. You'll need to supply a different function other than identity() if your objects you want to group are more complex.

public class StreamGroupingBy
{

   public static void main( String[] args )
   {
      List<String> items = Arrays.asList(  
              "a", "b", "c", "d", 
              "a", "b", "c",
              "a", "b", 
              "a", "x" );

      Map<String,List<String>> result = items.stream().collect(
              Collectors.groupingBy( Function.identity() ) );
      System.out.println( result );
   }
}

Output:

{a=[a, a, a, a], b=[b, b, b], c=[c, c], d=[d], x=[x]}
markspace
  • 10,621
  • 3
  • 25
  • 39
0

I would also recommend to implement a hashing mechanism. You can do something similar with Guava FluentIterable:

FluentIterable.from(collection)
    .index(new Function<T, K>() {
        K apply(T input) {
            //transform T to K hash
        }
    })//that would return ImmutableListMultimap<K, T>
    .asMap()//that would return Map<K, Collection<T>>
    .values();//Collection<Collection<T>>
Nailgun
  • 3,999
  • 4
  • 31
  • 46