Comparing iterables for same content, but not regarding order

Question

I'm attempting to compare two Iterables in Java of same size. I only need to know that the contents are the same. However, something like [1, 2] and [1, 2, 2] should not be equal, while [1, 2, 2, 4] should equal [1, 2, 4, 2].

boolean functionName() {
    boolean pvk;
    ... setup ...
    for(Edge e : pMST.edges()) {
      pvk = false;
      for(Edge f : kMST.edges()) {
        if(e == f) {
          pvk = true;
          System.out.println("True.");
        }
      }
      if(!pvk) return false;
    }
return true;
}

There's my initial lousy attempt, but not only does this always return false, it doesn't account for duplicates properly.

Are you expecting larger numbers of elements? Because regarding memory footprint, I can't think of a solution that does not use a temporary container of `n` elements (at least at some point). — Izruo, Jun 11 '19 at 22:58
An in place solution would obviously be best, but that extra memory might be fine with my input sizes. — SolidSnackDrive, Jun 11 '19 at 23:01
Actually, I found [this](https://stackoverflow.com/a/13502032/7525132) answer, that is just slightly more memory efficient for iterables with duplicates. Nevertheless it should already help you out. — Izruo, Jun 11 '19 at 23:07
You can model this with a `Map`, where the key is the number and the value is the number of times that occurs. — Andy Turner, Jun 11 '19 at 23:07

Stuart Marks · Answer 1 · 2019-06-12T06:48:05.757

You could sort the items and compare the resulting lists, but this is potentially slow O(n lg n) and it relies on the items either being Comparable or having a total order imposed on them by a Comparator. This might be infeasible.

This other answer suggests using a Guava Multiset. This makes sense, as it keeps track of the elements and the count of occurrences, which is significant for your question. It should be O(n) for reasonable implementations such as a HashMultiset. Other libraries such as Apache Commons (MultiSet) and Eclipse Collections (Bag) have collection implementations that are functionally equivalent to Guava’s Multiset.

If you don't want to include a dependency on any of these libraries, you can do this in the JDK by itself. Unfortunately Java doesn't have a Bag implementation, but for this purpose it's easy to emulate it using a Map from your item type to a count, either Integer or Long.

If you have Lists, you can do this:

boolean unorderedEquals(List<Item> list1, List<Item> list2) {
    Map<Item, Long> freq1 = list1.stream().collect(groupingBy(i -> i, counting()));
    Map<Item, Long> freq2 = list2.stream().collect(groupingBy(i -> i, counting()));
    return freq1.equals(freq2);
}

If you have Iterables, you need to build up the maps using forEach instead:

boolean unorderedEquals(Iterable<Item> iter1, Iterable<Item> iter2) {
    Map<Item, Integer> freq1 = new HashMap<>();
    iter1.forEach(it -> freq1.merge(it, 1, (a, b) -> a + b));
    Map<Item, Integer> freq2 = new HashMap<>();
    iter2.forEach(it -> freq2.merge(it, 1, (a, b) -> a + b));
    return freq1.equals(freq2);
}

Good answer, but I would rather not use external libraries for this one. — SolidSnackDrive, Jun 12 '19 at 08:23
@SolidSnackDrive that’s why this answer shows a solution without using 3rd party libraries. — Holger, Jun 12 '19 at 10:16

score 3 · Answer 2 · edited Jun 20 '20 at 09:12

Combining this answer with ideas from this thread, notably this answer to create an efficient but readable solution, you may use

static boolean unorderedEquals(Collection<?> coll1, Collection<?> coll2) {
    if(coll1.size() != coll2.size()) return false;
    Map<Object, Integer> freq = new HashMap<>();
    for(Object o: coll1) freq.merge(o, 1, Integer::sum);
    for(Object o: coll2)
        if(freq.merge(o, -1, Integer::sum) < 0) return false;
    return true;
}

The first loop creates a frequency map like in the linked answer, but instead of building a second map, to perform an expensive comparison, the second loop decreases the counts on each occurrence, returning immediately, if a count became negative. The merge method smoothly handles the case of absent keys.

Since it has been checked right at the beginning of the method that both lists have the same size, after increasing and decreasing, the total count must be zero. Since we have proven that there are no negative numbers, as we returned immediately for them, there can’t be positive non-zero values either. So we can return true after the second loop without further checks.

Supporting arbitrary Iterables, which differ from Collection in not necessarily having a size() method, is a bit trickier, as we can’t do the pre-check then and hence, have to maintain the count:

static boolean unorderedEquals(Iterable<?> iter1, Iterable<?> iter2) {
    Map<Object, Integer> freq = new HashMap<>();
    int size = 0;
    for(Object o: iter1) {
        freq.merge(o, 1, Integer::sum);
        size++;
    }
    for(Object o: iter2)
        if(--size < 0 || freq.merge(o, -1, Integer::sum) < 0) return false;
    return size == 0;
}

If we want avoid the boxing overhead, we have to resort to a mutable value for the map, e.g.

static boolean unorderedEquals(Collection<?> coll1, Collection<?> coll2) {
    if(coll1.size() != coll2.size()) return false;
    Map<Object, int[]> freq = new HashMap<>();
    for(Object o: coll1) freq.computeIfAbsent(o, x -> new int[1])[0]++;
    int[] absent = { 0 };
    for(Object o: coll2) if(freq.getOrDefault(o, absent)[0]-- == 0) return false;
    return true;
}

But I don’t think that his will pay off. For small numbers of occurrences, boxing will reuse the Integer instances whereas we need a distinct int[] object for each distinct element when using mutable values.

But using compute might be interesting for the Iterable solution, when using it like

static boolean unorderedEquals(Iterable<?> coll1, Iterable<?> coll2) {
    Map<Object, int[]> freq = new HashMap<>();
    for(Object o: coll1) freq.computeIfAbsent(o, x -> new int[1])[0]++;
    int[] absent = {};
    for(Object o: coll2)
        if(freq.compute(o, (key,c) -> c == null || c[0] == 0? absent:
                                      --c[0] == 0? null: c) == absent) return false;
    return freq.isEmpty();
}

which removes entries from the map when their count reaches zero, so we only have to check the map for emptiness at the end.

WJS · Answer 3 · 2019-06-12T12:42:33.340

I would sort them. But first I would compare the sizes before doing the sort. You would need to provide a Comparator<T> to be used by the sort method. If you're sorting Integers, you could use:

      List<Integer> a = new ArrayList<>(List.of(1, 2, 3, 3, 3, 3, 4, 5, 6));
      List<Integer> b = new ArrayList<>(List.of(2, 3, 1, 3, 4, 5, 6, 3, 3));
      System.out.println(compareLists(a, b, Comparator.naturalOrder()));

   public static <T> boolean compareList(List<T> list1, List<T> list2,
         Comparator<T> comp) {

      if (list1 == list2) {
          return true;
      }
      if (list1.size() != list2.size()) {
         return false;
      }
      Collections.sort(list1, comp);
      Collections.sort(list2, comp);

      return list1.equals(list2);
   }

@Holger You are correct. And the sad thing is that just a few days ago I used this feature and pondered on the requirement that there must be a positional one-to-one mapping. — WJS, Jun 12 '19 at 12:44

Comparing iterables for same content, but not regarding order

3 Answers3

Linked