3

First of all I hope this question has not been asked before. I've looked a bit and could not find an appropriate answer :s

I'm looking for an efficient way of moving some objects from one collection to an other, when a specific condition is true.

Currently, I would do it in a pretty straightforward way, but I'm afraid this might not be optimal:

Collection<Object> myFirstCollection;  //let's consider it instanciated and populated
Collection<Object> mySecondCollection; //same for this one

myFirstCollection.stream().forEach(o -> { 
    if ( conditionReturningTrue(o) ) {
        mySecondCollection.add(o);
        myFirstCollection.remove(o);
    }
});

Do you know any better way / more efficient of doing that ?

  • 2
    What ist your Definition of *efficient*? BTW: streams API has the method `filter()` for this. – Timothy Truckle Sep 12 '18 at 12:25
  • By efficient, I mean fastest. `filter()` would work too, however I don't know if it's faster or not. – Thibault Beziers la fosse Sep 12 '18 at 12:33
  • 2
    I repeat my comment below (at an answer): Don't think about such micro-optimizations in terms of performance. Let code readability be always your maxim. This is thousand times more important! Often good and readable code performs well as a side effect. – Seelenvirtuose Sep 12 '18 at 12:34
  • 2
    *"By efficient, I mean fastest."* - How fast do you need it? When in doubt always choose the form that is most *readable*. Care for performance only when you experience a performance problem and change the code in question only if you have **proven by measurement** that this particular code causes the problem and the alternative form really solves it. – Timothy Truckle Sep 12 '18 at 12:37
  • I understand your point about readability, and completely agree with you. I cannot quantify "how fast" I want this piece of code to be, however I want to be sure that I'm not missing any obvious "faster" solution to this small case :) – Thibault Beziers la fosse Sep 12 '18 at 12:48
  • 2
    @ThibaultBezierslafosse The lesson to learn here: Don't think about "missing any obvious faster solution"! Look for the most readable solution, and if this is slower than any other ... still go for it! – Seelenvirtuose Sep 12 '18 at 12:50

4 Answers4

7

To make it more readable, there are Collection::addAll and Collection::removeAll to use in this situation, your code can be :

// create a new Collection where you use filter to search only the Object you want
Collection<Object> filterdCollection = myFirstCollection.stream()
        .filter(o -> conditionReturningTrue(o))
        .collect(Collectors.toCollection(LinkedHashSet::new));

// use allAll to add all the filtered Object to the second collection
mySecondCollection.addAll(filterdCollection);
// use removeAll to remove all the filtered Object from the first collection
myFirstCollection.removeAll(filterdCollection);
Youcef LAIDANI
  • 55,661
  • 15
  • 90
  • 140
  • is this more efficient than one foreach? since you use addAll and removeAll? – Scorix Sep 12 '18 at 12:27
  • @Scorix you can take a look at this [JAVA : Set addAll() vs adding using loop](https://stackoverflow.com/questions/24203950/java-set-addall-vs-adding-using-loop) – Youcef LAIDANI Sep 12 '18 at 12:29
  • 4
    @Scorix Depending on the underlying list implementation, an `addAll` could be faster then multiple `add` operations. But this performance gain (or loss) is negligible. Code readability is not! – Seelenvirtuose Sep 12 '18 at 12:31
  • Thanks for your answer, I'm going to try this out. Is a specific implementation of collection more efficient for this situation? So far I'm using ArrayLists by default but this [article](https://dzone.com/articles/java-collection-performance) shows slightly faster results on removeAll using linkedLists. – Thibault Beziers la fosse Sep 12 '18 at 12:38
  • 3
    @ThibaultBezierslafosse If you want to know which one is more performance than the other you have to create a benchmark to check this as well take a look at this [When to use LinkedList over ArrayList in Java?](https://stackoverflow.com/questions/322715/when-to-use-linkedlist-over-arraylist-in-java) maybe it can answer your question – Youcef LAIDANI Sep 12 '18 at 12:46
  • 3
    When both collections are `ArrayList`s, the `removeAll` can be quiet expensive. But here, we can control it. Just change `Collectors.toList()` to `Collectors.toCollection(LinkedHashSet::new)`. Then, `filterdCollection` will still maintain the order (relevant for the `addAll`), but support a fast lookup (which will make the `removeAll` a linear operation). – Holger Sep 12 '18 at 16:56
  • 1
    thank you @Holger for the information, a new one again :) – Youcef LAIDANI Sep 12 '18 at 20:03
  • @Holger But using a `Set` will miss duplicates when adding to the second collection – fps Sep 13 '18 at 14:59
  • 2
    @FedericoPeraltaSchaffner if handling duplicates is an issue, then you’re probably better off collecting into a `List` first, to be used with `mySecondCollection.addAll(filterdCollection);`, followed by `myFirstCollection.removeAll(new HashSet<>(filterdCollection));` – Holger Sep 13 '18 at 15:03
5

First of all, you should strive for correctness. For most collections, it is forbidden to modify the source collection while iterating over it. You may get a ConcurrentModificationException while trying, but even if it happen to run without an exception, the code still isn’t correct. It’s just that this error is not always detected (it’s a best-effort check, trying to avoid wasting too much performance). This applies to forEach(…), as well as stream().forEach(…), and the for-each loop (for(variable declaration: collection))

The only support for removing elements while iterating is via manual Iterator usage:

for(Iterator<Object> it = myFirstCollection.iterator(); it.hasNext(); ) {
    Object o = it.next();
    if(conditionReturningTrue(o)) {
        it.remove();
        mySecondCollection.add(o);
    }
}

The alternative are bulk methods.

First, like shown in this and that answer, creating a copy of all elements to be transferred first.

Second, you can use

myFirstCollection.removeIf(o -> conditionReturningTrue(o) && mySecondCollection.add(o));

The default implementation of removeIf uses an Iterator in a loop similar to the one above. However, collections like ArrayList provide their own implementation of removeIf, to overcome the quadratic time complexity of the Iterator loop.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • It's worth mentioning that, if the second collection is a `Set`, the approach with `Iterator` will remove all elements that match the condition from the first collection *including duplicates* (that is, if the 1st collection allows duplicates, i.e. if it's not a `Set`), while the approach that uses `removeIf` will remove only the first occurrence of the element that matches the condition. My point is that the approaches shown here are not identical and they might behave differently, depending on the concrete type of the `Collection`s being used. – fps Sep 13 '18 at 15:16
  • 1
    @FedericoPeraltaSchaffner yes, I though about it and concluded that when you intend to *transfer* elements, it is correct to keep duplicates in the old collection if the target collection is not capable of accepting them. You could also say that the idea of transferring is only meaningful if source and target have the same type, i.e. are both a `List` or both a `Set`. – Holger Sep 13 '18 at 15:20
  • Then I think you should change the `if` condition in the `Iterator` variant to `if (conditionReturningTrue(o) && mySecondCollection.add(o)) {`. Anyways, I admit I'm being a little bit nitpicking regarding all this stuff... :) – fps Sep 13 '18 at 15:23
4

You might be able to improve performance by avoiding removeAll (which might require quadratic time for some Collections in which object lookup requires linear search, such as Lists), by using Collectors.partitioningBy to split the original Collection into two Lists:

Collection<Object> myFirstCollection;  //let's consider it instanciated and populated
Collection<Object> mySecondCollection; //same for this one

Map<Boolean,List<Object>> partition = 
    myFirstCollection.stream()
                     .collect(Collectors.partitioningBy(o -> conditionReturningTrue(o)));
myFirstCollection.clear();
myFirstCollections.addAll(partition.get(false));
mySecondCollection.addAll(partition.get(true));

On the other hand, this solution may be less efficient if only few elements should be moved from myFirstCollection to mySecondCollection.

Eran
  • 387,369
  • 54
  • 702
  • 768
  • This approach is interesting too. I took the liberty of comparing it with the approach using `filter()`, and it seems to provide better results when the first collection starts getting large (thousands of elements), probably because, as you said it, `removeAll()` is quadratic. However it is way slower for smaller collections. – Thibault Beziers la fosse Sep 12 '18 at 13:22
  • 2
    `removeAll` is quadratic because [the other answer](https://stackoverflow.com/a/52295144/2711488) uses `toList()`. As said in [this comment](https://stackoverflow.com/questions/52295058/how-can-i-efficiently-transfer-objects-from-one-java-collection-to-an-other-acco/52295508#comment91547937_52295144), it doesn’t have to be. Just using `toCollection(LinkedHashSet::new)` solves this. Then, it’s almost unpredictable, which variant will be faster, as too many environmental factors are involved. Btw, your variant could use `myFirstCollection = partition.get(false);` instead of `clear` and `addAll`… – Holger Sep 12 '18 at 16:59
  • @Holger all good points, though I was under the assumption the collection instances referenced by the 2 variables shouldn't be replaced by new instances. That's why I used clear and addAll. – Eran Sep 12 '18 at 17:06
  • 1
    Yes, that’s underspecified. So it’s worth pointing out that this could be a viable alternative, if the application can be designed to work with the new collection instead of requiring modifying the old one. Otherwise, `clear()` + `addAll` is the simplest solution (and perhaps most efficient as long as we only work on the `Collection` interface and don’t assume a particular implementation). – Holger Sep 12 '18 at 17:13
1

You already got a good answer from YCF_L here: https://stackoverflow.com/a/52295144/9568238

But I would just like to add that if you do go for the forEach method, as you describe in your question, then the stream() is redundant. You can just do myFirstCollection.forEach(...)

Either way, I'd go with the answer mentioned.

  • 2
    It doesn’t matter whether you use `stream().forEach(…)` or just `forEach(…)`; when you modify the source collection from within the consumer, both variants are broken. – Holger Sep 12 '18 at 16:33