I was using Multiset
to have easy access to the freq of elements, but I realize there is Collections#frequency(Collection<?>, Object)
that does the same for any collection. What is the point of using Multiset
then? Is performance an issue here?

- 27,415
- 11
- 90
- 112

- 10,118
- 14
- 61
- 120
1 Answers
Guava documentation for Multiset#count() has to say:
Note that for an Object.equals(java.lang.Object)-based multiset, this gives the same result as Collections.frequency(java.util.Collection, java.lang.Object) (which would presumably perform more poorly).
So, yes, I suspect that performance is the issue here.
I think Multiset#count
is more efficient because Collections#frequency
iterates through the entire collection. For an object o whose frequency you're checking, it goes through all elements e in the collection and checks (o == null ? e == null : o.equals(e))
.
For Multiset (which is an interface), the exact implementation of count
depends on the class. If it is a HashMultiset
, for example, then it is backed by a HashMap
. For details about how that is more efficient than iterating through the whole collection, take a look at this answer: How does a Java HashMap handle different objects with the same hash code?.
The Guava code is as follows
public int count(@Nullable Object element) {
Count frequency = Maps.safeGet(backingMap, element);
return (frequency == null) ? 0 : frequency.get();
}
Similarly, for a TreeMultiset
, which maintains the ordering of its elements and is backed by an AVL tree, count
can be obtained in O(log(n)) steps instead of O(n), where n is the size of the collection. The Guava code is as follows:
public int count(@Nullable Object element) {
try {
@SuppressWarnings("unchecked")
E e = (E) element;
AvlNode<E> root = rootReference.get();
if (!range.contains(e) || root == null) {
return 0;
}
return root.count(comparator(), e);
} catch (ClassCastException e) {
return 0;
} catch (NullPointerException e) {
return 0;
}
}

- 1
- 1

- 8,216
- 1
- 43
- 92
-
I'm missing one simple point in your answer: If a `Multiset` contains one element a million times, `Collections.frequency` will iterate it a million times while `Multiset` simply finds the element (your Big-O-notation applies) and says "one million". – maaartinus Dec 09 '14 at 16:27
-
1For Multiset, `n` typically refers to the number of _distinct_ elements, whereas for e.g. a `List` it's going to refer to the total number of elements, distinct or not. – Louis Wasserman Dec 09 '14 at 20:56