2

There are a large number of elements saved in a HashBag (of Eclipse Collections framework). Now all elements with less than k occurrences should be removed.

This could be done with:

bag.removeAll(bag.selectByOccurrences(n->n<k));

Downside is, this creates a temporary bag instance which in our case consumes much memory.

So I'm looking for an in-place removal approach, e.g. with an iterator. The iterator returned by iterator() iterates n times over an element with n occurrences which isn't suitable CPU wise. Better would be iterating over all distinct keys of the underlying ObjectIntMap. In the source code you can find a method AbstractHashBag.getKeysView() but it's protected. Is there a way to access it via public API or any other ideas to remove such elements in-place?

pakat
  • 286
  • 2
  • 5

1 Answers1

4

If you can replace the original bag, instead of mutating it, you can just use the selectByOccurrences with the reverse predicate.

If that won't work, the following solution still creates a temporary bag, but should be more efficient than removeAll(Collection).

MutableBag<Integer> bag = Interval.oneTo(10).toBag()
        .withAll(Interval.oneTo(10))
        .withAll(Interval.evensFromTo(1, 10));

// Removes all odd numbers since they only occur twice
bag.selectByOccurrences(n -> n < 3).forEachWithOccurrences(bag::removeOccurrences);

For this use case, it seems like it would be useful to add a new method on MutableBag called removeIfOccurrences(IntPredicate). I think this would make sense to add as an API if you'd like to create an issue and/or make a contribution to the library.

Donald Raab
  • 6,458
  • 2
  • 36
  • 44