12

For example, I need something like:

Collection<String> collection = /* ... */;
Stream<Object> stream = /* ... */;
boolean containsAll = stream.map(Object::toString).containsAll(collection);

Of course, I could accumulate all elements of the stream into another Collection using collect() method and the call Collection.containsAll(), but what if the stream is too big and it is inefficient to process all its elements?

ETO
  • 6,970
  • 1
  • 20
  • 37
Nolequen
  • 3,032
  • 6
  • 36
  • 55
  • 1
    Personally I think it's actually better to go with `Set temp = source.stream().map(Object::toString).collect(toSet()); boolean containsAll = temp.containsAll(collection);` – Ousmane D. Oct 07 '19 at 12:49
  • @OusmaneD. What if *the stream is too big* as OP assumes? Imagine the case when the `stream` is lazily generated and not all elements are persisted in the memory. E.g. using `Files::lines` you can process *huge* files even though they could not fit in your memory. Collecting into a set will cause `OutOfMemoryError` in such a case. – ETO Oct 07 '19 at 17:41

3 Answers3

10

This should do the trick:

Set<String> set = new HashSet<>(collection);
boolean containsAll = set.isEmpty() || stream.map(Object::toString)
                                             .anyMatch(s -> set.remove(s) && set.isEmpty());

The solution might look confusing, but the idea is straightforward:

  1. In order to prevent multiple iterations over collection we wrap it into a HashSet. (In case your stream is a parallel one, then you will have to use a concurrent hash set. See this post for more details)
  2. If the collection (or set) is empty then we return true without processing the stream
  3. For each entry of stream we try to remove it from set. In case the result of Set::remove is true (hence it was contained by set) and the set is empty after removal, we can conclude that stream contained all the elements of initial collection.
  4. The terminal operation Stream::anyMatch is a short-circuiting one. So it will stop iterating over stream once the set is empty. In worst case we will process the entire stream.

Perhaps this is a bit more readable form:

Set<String> set = new HashSet<>(collection);
boolean containsAll = set.isEmpty() || stream.map(Object::toString)
                                             .filter(set::remove)
                                             .anyMatch(__ -> set.isEmpty());

If the collection can contain duplicates and there is a requirement to check if stream contains all of them, then we will need to maintain a concurrent map of counters.

Map<String, AtomicLong> map = new ConcurrentHashMap<>();
collection.forEach(s -> map.computeIfAbsent(s, __ -> new AtomicLong()).incrementAndGet());
boolean containsAll = map.isEmpty() || stream.map(Object::toString)
                                             .filter(map::containsKey)
                                             .filter(s -> map.get(s).decrementAndGet() == 0)
                                             .filter(s -> map.remove(s) != null)
                                             .anyMatch(__ -> map.isEmpty());

The code slightly changed but the idea is the same.

ETO
  • 6,970
  • 1
  • 20
  • 37
  • Seems like you could also wrap it in a `List` to support checking if the stream contains multiple instances of an element. `List.remove` would return true until all copies are removed. – Sean Van Gorder Oct 07 '19 at 17:53
  • @SeanVanGorder Actually, this is a very interesting case. For the sake of simplicity I ignored that in my solution. But using `List` we'll loose the performance boost `HashSet` offers. – ETO Oct 07 '19 at 18:00
  • You could just check if the original collection is a `Set` and wrap it in `HashSet` instead of `ArrayList` in that case. This map count preprocessing seems like a trade-off, it might be faster for huge non-`Set` collections but probably worse for small ones. – Sean Van Gorder Oct 07 '19 at 18:48
  • This makes sense. Programming is all about trade-offs. Different data structures are suitable for different cases. It is still up to software engineer to pick the one which fits better the task he/she tries to resolve. – ETO Oct 07 '19 at 18:55
  • This blew my mind. I never thought if using stream like this. Thanks a lot. From a college student – RukaDo Jun 09 '20 at 06:18
4

Regardless of how big the Stream is, you will have to process all of its elements if it does not contain all the elements of the Collection.

You could save processing time if a small prefix of the Stream contains all the elements of Collection, and the Collection is much smaller than the Stream.

boolean containsAll = 
    stream.map(Object::toString)
          .filter(s -> collection.contains(s)) // it would be wise to convert collection to a Set
          .limit(collection.size())
          .count() == collection.size();

Note that if the Stream may contain multiple copies of the same element of the Collection, you may have to add a .distinct() operation after the filter().

Eran
  • 387,369
  • 54
  • 702
  • 768
  • I don't really understand the benefit of not using `collect` over any other operation(`count`) in this case. Since the stream processing for the complete elements would anyway take place(worst case), is there a possible optimization to get away with that much of memory being used? – Naman Oct 07 '19 at 12:49
  • @Naman would it process all the elements if, for example, the stream has 1000000 elements but all the elements of the Collection appear in the first 50 elements of the Stream? Of course, if not all the elements of the Collection appear in the Stream (or if the last of them appears near the end of the Stream), this solution wouldn't help. – Eran Oct 07 '19 at 12:52
  • What if the stream has some null entries? If the stream and collection are sorted, will it be faster to process _containsAll_? – Paul Oct 07 '19 at 13:27
  • @Eran By _containsAll_ I meant your method, finding if a collection contains all elements of a stream. Thanks for clarification. – Paul Oct 07 '19 at 13:37
  • @Paul I misunderstood your question. If the Stream and Collection are sorted, it should be possible to make an efficient solution (that doesn't process the entire Stream) even if the Stream doesn't contain all elements of the Collection, since we can identify the point in the Stream after which there can be no more elements that belong to the Collection. This can be done with `takeWhile`. – Eran Oct 07 '19 at 13:42
3

Create a Set from Collection<String> to make search operation faster O(1)

Set<String> set = new HashSet<>(collection);

Then use allMatch to check every item in stream contains in set or not

boolean containsAll = stream.map(Object::toString)
                            .allMatch(s -> set.contains(s));

Another way :

Filter by not contained in set and use limit(1) for optimization

boolean isContains = stream.map(Object::toString)
                           .filter(s -> !set.contains(s))
                           .limit(1)
                           .count() > 0;
Eklavya
  • 17,618
  • 4
  • 28
  • 57
  • This is not what OP asked for. If set contains ("foo", "baa"), and your stream contains 100 x "foo", your function returns the wrong solution. – kaiser Feb 09 '23 at 13:50