30

I have a method that has to remove any element listed in a (small) Set<K> keysToRemove from some (potentially large) Map<K,V> from. But removeAll() doesn't do, as I need to return all keys that were actually removed, since the map might or might not contain keys that require removal.

Old-school code is straight forward:

public Set<K> removeEntries(Map<K, V> from) {
    Set<K> fromKeys = from.keySet();
    Set<K> removedKeys = new HashSet<>();
    for (K keyToRemove : keysToRemove) {
        if (fromKeys.contains(keyToRemove)) {
            fromKeys.remove(keyToRemove);
            removedKeys.add(keyToRemove);
        }
    }
    return removedKeys;
}

The same, written using streams:

Set<K> fromKeys = from.keySet();
return keysToRemove.stream()
        .filter(fromKeys::contains)
        .map(k -> {
            fromKeys.remove(k);
            return k;
        })
        .collect(Collectors.toSet());

I find that a bit more concise, but I also find that lambda too clunky.

Any suggestions how to achieve the same result in less clumsy ways?

GhostCat
  • 137,827
  • 25
  • 176
  • 248
  • 2
    How about just collecting all keys that can be removed and then call `removeAll()` on that filtered set? Or how about "filtering" on `fromKeys::remove`? – Thomas Jun 13 '19 at 14:28
  • 1
    I believe and inferring from the answers here, the improvement that comes mostly out of any change is by using `if (fromKeys.remove(keyToRemove)) { removedKeys.add(keyToRemove); }` instead of using both contains and remove in `if (fromKeys.contains(keyToRemove)) { fromKeys.remove(keyToRemove); removedKeys.add(keyToRemove); }` – Naman Jun 13 '19 at 16:51

6 Answers6

26

The “old-school code” should rather be

public Set<K> removeEntries(Map<K, ?> from) {
    Set<K> fromKeys = from.keySet(), removedKeys = new HashSet<>(keysToRemove);
    removedKeys.retainAll(fromKeys);
    fromKeys.removeAll(removedKeys);
    return removedKeys;
}

Since you said that keysToRemove is rather small, the copying overhead likely doesn’t matter. Otherwise, use the loop, but don’t do the hash lookup twice:

public Set<K> removeEntries(Map<K, ?> from) {
    Set<K> fromKeys = from.keySet();
    Set<K> removedKeys = new HashSet<>();
    for(K keyToRemove : keysToRemove)
        if(fromKeys.remove(keyToRemove)) removedKeys.add(keyToRemove);
    return removedKeys;
}

You can express the same logic as a stream as

public Set<K> removeEntries(Map<K, ?> from) {
    return keysToRemove.stream()
        .filter(from.keySet()::remove)
        .collect(Collectors.toSet());
}

but since this is a stateful filter, it is highly discouraged. A cleaner variant would be

public Set<K> removeEntries(Map<K, ?> from) {
    Set<K> result = keysToRemove.stream()
        .filter(from.keySet()::contains)
        .collect(Collectors.toSet());
    from.keySet().removeAll(result);
    return result;
}

and if you want to maximize the “streamy” usage, you can replace from.keySet().removeAll(result); with from.keySet().removeIf(result::contains), which is quiet expensive, as it is iterating over the larger map, or with result.forEach(from.keySet()::remove), which doesn’t have that disadvantage, but still, isn’t more readable than removeAll.

All in all, the “old-school code” is much better than that.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • Was working around with the existing code. Wouldn't `public Set removeEntries(Map from) { Set removedKeys = new HashSet<>(); for (K keyToRemove : keysToRemove) { if (from.keySet().remove(keyToRemove)) { removedKeys.add(keyToRemove); } } return removedKeys; }` be a better old school code than the `retainAll`, `removeAll` solution. I meant to be focussing(nitpicking) on the single iteration versus two iterations with their implementation. (I might just be wrong though in inferring so.) – Naman Jun 13 '19 at 16:47
  • 1
    @Naman that’s what I posted as second variant, right for the case that the iteration matters. However, the `retainAll`/`removeAll` combo will iterate over the set that has been specified by the OP as being rather small. – Holger Jun 13 '19 at 17:38
  • Oh, yes the second variant indeed is the same. But wouldn't the `removeAll` iterate the `fromKeys` instead of `keysToRemove` there? About the first stream variant, how about using the predicate for reduction using `partitioningBy`, wrote a [sample code](https://stackoverflow.com/a/56585387/1746118)? – Naman Jun 13 '19 at 17:44
  • 2
    @Naman that’s hitting implementation details, but I assume that it works like [`AbstractSet.removeAll(…)`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/AbstractSet.html#removeAll(java.util.Collection)), if not even inheriting right that method: “*This implementation determines which is the smaller of this set and the specified collection, by invoking the size method on each. …[etc]*”. Using a stateful predicate for `partitioningBy` is as discouraged as for `filter`, but with the latter, you’re collection another set of actually unwanted elements… – Holger Jun 13 '19 at 17:50
  • Slightly off topic, but how would do you verify the correctness of your solution? Do you actually write everything from scratch in IntelliJ (`public static void main` and all) and verify? Asking because I'm not familiar with streams or the java tag. – cs95 Jun 14 '19 at 04:11
  • 3
    @cs95 well yes, for most SO answers, I write some test code, either from scratch or using the question’s code as starting point, if any. Depending on the context, it might be in Netbeans, Eclipse or command line. When it comes to compiler-dependent behavior, I also have batch files to compile&run the same source code with different JDKs. – Holger Jun 14 '19 at 07:08
  • 1
    @Naman my last sentence was written in a hurry. I wanted to say, that `partitioningBy` does more work when necessary, when only one of the two sets is needed. Besides that, it’s like the `filter` approach. – Holger Jun 14 '19 at 07:17
  • @Holger I got what you meant by another set anyway. Thanks – Naman Jun 14 '19 at 07:21
  • When you're already creating an [mcve] for your tests, why not post it so that others can check it out more quickly? At least, I *usually* try to do this, along with some "dummy/test" data and syso's that show the results... – Marco13 Jun 14 '19 at 12:57
  • 2
    @Marco13 I often do this, especially for questions containing an example, but not every answer would benefit from examples. Further, not all of my test code is a minimal example. And sometimes, it gets edited for other tests, without having all test in the code at a time, so it would require significant cleanup before getting published. – Holger Jun 14 '19 at 13:44
13

More concise solution, but still with unwanted side effect in the filter call:

Set<K> removedKeys =
    keysToRemove.stream()
                .filter(fromKeys::remove)
                .collect(Collectors.toSet());

Set.remove already returns true if the set contained the specified element.

 P.S. In the end, I would probably stick with the "old-school code".

Oleksandr Pyrohov
  • 14,685
  • 6
  • 61
  • 90
  • 4
    Exactly my thought ;) - It just feels a little hacky because we're "filtering" on a method that actually represents a side effect. – Thomas Jun 13 '19 at 14:32
5

I wouldn’t use Streams for this. I would take advantage of retainAll:

public Set<K> removeEntries(Map<K, V> from) {
    Set<K> matchingKeys = new HashSet<>(from.keySet());
    matchingKeys.retainAll(keysToRemove);

    from.keySet().removeAll(matchingKeys);

    return matchingKeys;
}
VGR
  • 40,506
  • 4
  • 48
  • 63
  • 3
    That’s pointing into the right direction, but you are copying the “potentially large” `from` map’s keyset whereas you can copy the “small” `keysToRemove` instead, as the intersection of a and b is the same as of b and a. Further, `matchingKeys` is potentially smaller than `keysToRemove`, so `removeAll(matchingKeys)` is preferable. – Holger Jun 13 '19 at 15:43
  • @Holger I see your point, but the Set is merely copying references, which seems benign to me, unless the Map’s size is truly huge. You’re right about removeAll(matchingKeys) though. Updated. – VGR Jun 13 '19 at 15:49
  • 3
    It’s not just copying references, but hashing. And since the OP stated the expected sizes and swapping the two is trivial, I’d do this. In fact, [I did](https://stackoverflow.com/a/56584064/2711488). – Holger Jun 13 '19 at 15:52
4

You can use the stream and the removeAll

Set<K> fromKeys = from.keySet();
Set<K> removedKeys = keysToRemove.stream()
    .filter(fromKeys::contains)
    .collect(Collectors.toSet());
fromKeys.removeAll(removedKeys);
return removedKeys;
MaanooAk
  • 2,418
  • 17
  • 28
4

You can use this:

Set<K> removedKeys = keysToRemove.stream()
        .filter(from::containsKey)
        .collect(Collectors.toSet());
removedKeys.forEach(from::remove);

It's similar to Oleksandr's answer, but avoiding the side effect. But I would stick with that answer, if you are looking for performance.

Alternatively you could use Stream.peek() for the remove, but be careful with other side effects (see the comments). So I would not recommend that.

Set<K> removedKeys = keysToRemove.stream()
        .filter(from::containsKey)
        .peek(from::remove)
        .collect(Collectors.toSet());
Samuel Philipp
  • 10,631
  • 12
  • 36
  • 56
  • 5
    Never use peek for anything other than debugging! See https://stackoverflow.com/questions/47356992/possible-side-effect-of-stream-peek-changing-state-and-why-not-use-it-like-this and the questions linked in it – Michael A. Schaffrath Jun 13 '19 at 15:02
  • 2
    @MichaelA.Schaffrath Makes perfect sense. Fun fact: I tried my initial lambda again, with map. IntelliJ suggests to replace that call to map() to peek() ;-) – GhostCat Jun 13 '19 at 15:26
  • 4
    Or more general general: [In Java streams is peek really only for debugging?](https://stackoverflow.com/q/33635717/2711488) – Holger Jun 13 '19 at 15:59
3

To add another variant to the approaches, one could also partition the keys and return the required Set as:

public Set<K> removeEntries(Map<K, ?> from) {
    Map<Boolean, Set<K>> partitioned = keysToRemove.stream()
            .collect(Collectors.partitioningBy(k -> from.keySet().remove(k),
                    Collectors.toSet()));
    return partitioned.get(Boolean.TRUE);
}
Naman
  • 27,789
  • 26
  • 218
  • 353
  • Also leaves a choice of using the keys that were not a part of the keySet of the map. (just in case) – Naman Jun 13 '19 at 17:04