13

I am looking at this code and trying to understand the following piece of code.

copied from Stuart Marks answer

public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
    Map<Object,Boolean> seen = new ConcurrentHashMap<>();
    return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}

BigDecimal totalShare = orders.stream()
    .filter(distinctByKey(o -> o.getCompany().getId()))
    .map(Order::getShare)
    .reduce(BigDecimal.ZERO, BigDecimal::add);

My question here is everytime distinctByKey will be called and resulting new ConcurrentHashMap. how it is maintaining the state using new ConcurrentHashMap<>(); ?

Community
  • 1
  • 1
Santanu Sahoo
  • 1,137
  • 11
  • 29

2 Answers2

14

Since this is a capturing lambda, indeed a new Predicate instance will be returned all the time on each call to distinctByKey; but this will happen per entire stream, not per each individual element.

If you are willing to run your example with:

Djdk.internal.lambda.dumpProxyClasses=/Your/Path/Here

you would see that a class is generated for your the implementation of the Predicate. Because this is a stateful lambda - it captures the CHM and Function, it will have a private constructor and a static factory method that returns an instance.


Each call of distinctByKey will produce a different instance, but that instance will be reused for each element of the Stream. Things might be a bit more obvious if you run this example:

 public static <T> Predicate<T> distinctByKey(Function<? super T,Object> keyExtractor) {
    Map<Object,Boolean> seen = new ConcurrentHashMap<>();
    Predicate<T> predicate =  t -> {
        Object obj = keyExtractor.apply(t);
        System.out.println("stream element : " + obj);
        return seen.putIfAbsent(obj, Boolean.TRUE) == null;
    };

    System.out.println("Predicate with hash :" + predicate.hashCode());
    return predicate;
}

@Getter
@AllArgsConstructor
static class User {
    private final String name;
}

public static void main(String[] args) {
    Stream.of(new User("a"), new User("b"))
          .filter(distinctByKey(User::getName))
          .collect(Collectors.toList());

}

This will output:

Predicate with hash :1259475182
stream element : a
stream element : b

A single Predicate for both elements of the Stream.

If you add another filter:

Stream.of(new User("a"), new User("b"))
      .filter(distinctByKey(User::getName))
      .filter(distinctByKey(User::getName))
      .collect(Collectors.toList());

There will be two Predicates:

Predicate with hash :1259475182
Predicate with hash :1072591677
stream element : a
stream element : a
stream element : b
stream element : b
Eugene
  • 117,005
  • 15
  • 201
  • 306
  • A different instance of the Predicate for each stream element? No, there's one CHM and one Predicate per call to `distinctByKey`. – Stuart Marks Nov 21 '17 at 04:49
  • @StuartMarks wouldnt `get$Lambda` return a new Predicate all the time when it is called? – Eugene Nov 21 '17 at 04:55
  • Sure, but `get$Lambda()` should only be called when the lambda expression is evaluated, that is, from within the call to `distinctByKey`. Each stream element that's passed through filter will call the same lambda instance's `test()` method. It's that instance that holds the captured CHM and keyExtractor function. – Stuart Marks Nov 21 '17 at 05:42
6

It seems pretty confusing but it’s quite simple. What actually happens is that the distinctByKey method is called only once so there will only ever be one instance of the ConcurrentHashMap and it’s captured by the lambda expression. So when the distinctByKey method returns a Predicate object we then apply that to every element of the stream.

Ousmane D.
  • 54,915
  • 8
  • 91
  • 126
  • Is it true for parallelStream() ? – Santanu Sahoo Nov 20 '17 at 09:39
  • 4
    @xyz: don’t get confused by the fact that there’s a stream use. Just look at the *code structure*. You have `a().b(c()).d().e()`, so the methods are invoked in the order `a`, `c`, `b` (receiving the result of `c`), `d`, `e`. The `a` method is `stream`, `c` is `distinctByKey`, `b` is `filter`, etc., but the actual stream operation is only started with `e`, which is `reduce`. The other four methods have been invoked and completed before that. – Holger Nov 20 '17 at 09:50
  • @xyz the same applies. – Ousmane D. Nov 20 '17 at 10:54
  • 1
    @Aominè indeed that method will be called only once, but there will be *multiple* instances of the `Predicate` returned – Eugene Nov 20 '17 at 11:58
  • 2
    @Aominè no worries - it gets even funner when you think of the bootstrap method that get's called only once, a `CallSite` that wraps a `MethodHandle` to the `get$Lambda` method build by `ASM` linked only once, it's crazy how many details there are – Eugene Nov 20 '17 at 14:52