22

When a duplicate key entry is found during Collectors.toMap(), the merge function (o1, o2) is called.

Question: how can I get the key that caused the duplication?

String keyvalp = "test=one\ntest2=two\ntest2=three";

Pattern.compile("\n")
    .splitAsStream(keyval)
    .map(entry -> entry.split("="))
    .collect(Collectors.toMap(
        split -> split[0],
        split -> split[1],
        (o1, o2) -> {
            //TODO how to access the key that caused the duplicate? o1 and o2 are the values only
            //split[0]; //which is the key, cannot be accessed here
        },
    HashMap::new));

Inside the merge function I want to decide based on the key which if I cancel the mapping, or continue and take on of those values.

membersound
  • 81,582
  • 193
  • 585
  • 1,120
  • you can filter merged values afterwards, is that really needed to filter them while merging? – Vitaliy Moskalyuk Jun 07 '17 at 08:15
  • 1
    Could you give an example? During merge I have to decide which of the values to take (o1 or o2). The decision must be made on the **key**. But the key does not always occur twice. Only sometimes in which case the merge must be decided. – membersound Jun 07 '17 at 08:15
  • 1
    ok i see. You can create separate Map and run `.map(entry -> entry.split("=")).forEach()` and check for each entry if value is in Map already. If no - add, else - check decide if replace or no. – Vitaliy Moskalyuk Jun 07 '17 at 08:19

3 Answers3

6

You need to use a custom collector or use a different approach.

Map<String, String> map = new Hashmap<>();
Pattern.compile("\n")
    .splitAsStream(keyval)
    .map(entry -> entry.split("="))
    .forEach(arr -> map.merge(arr[0], arr[1], (o1, o2) -> /* use arr[0]));

Writing a custom collector is rather more complicated. You need a TriConsumer (key and two values) is similar which is not in the JDK which is why I am pretty sure there is no built in function which uses. ;)

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
5

The merge function has no chance to get the key, which is the same issue, the builtin function has, when you omit the merge function.

The solution is to use a different toMap implementation, which does not rely on Map.merge:

public static <T, K, V> Collector<T, ?, Map<K,V>>
    toMap(Function<? super T, ? extends K> keyMapper,
          Function<? super T, ? extends V> valueMapper) {
    return Collector.of(HashMap::new,
        (m, t) -> {
            K k = keyMapper.apply(t);
            V v = Objects.requireNonNull(valueMapper.apply(t));
            if(m.putIfAbsent(k, v) != null) throw duplicateKey(k, m.get(k), v);
        },
        (m1, m2) -> {
            m2.forEach((k,v) -> {
                if(m1.putIfAbsent(k, v)!=null) throw duplicateKey(k, m1.get(k), v);
            });
            return m1;
        });
}
private static IllegalStateException duplicateKey(Object k, Object v1, Object v2) {
    return new IllegalStateException("Duplicate key "+k+" (values "+v1+" and "+v2+')');
}

(This is basically what Java 9’s implementation of toMap without a merge function will do)

So all you need to do in your code, is to redirect the toMap call and omit the merge function:

String keyvalp = "test=one\ntest2=two\ntest2=three";

Map<String, String> map = Pattern.compile("\n")
        .splitAsStream(keyvalp)
        .map(entry -> entry.split("="))
        .collect(toMap(split -> split[0], split -> split[1]));

(or ContainingClass.toMap if its neither in the same class nor static imports)<\sup>

The collector supports parallel processing like the original toMap collector, though it’s not very likely to get a benefit from parallel processing here, even with more elements to process.

If, if I get you correctly, you only want to pick either, the older or newer value, in the merge function based on the actual key, you could do it with a key Predicate like this

public static <T, K, V> Collector<T, ?, Map<K,V>>
    toMap(Function<? super T, ? extends K> keyMapper,
          Function<? super T, ? extends V> valueMapper,
          Predicate<? super K> useOlder) {
    return Collector.of(HashMap::new,
        (m, t) -> {
            K k = keyMapper.apply(t);
            m.merge(k, valueMapper.apply(t), (a,b) -> useOlder.test(k)? a: b);
        },
        (m1, m2) -> {
            m2.forEach((k,v) -> m1.merge(k, v, (a,b) -> useOlder.test(k)? a: b));
            return m1;
        });
}
Map<String, String> map = Pattern.compile("\n")
        .splitAsStream(keyvalp)
        .map(entry -> entry.split("="))
        .collect(toMap(split -> split[0], split -> split[1], key -> condition));

There are several ways to customize this collector…

Holger
  • 285,553
  • 42
  • 434
  • 765
  • amazing, I came to this 1/2 later (I have already upvoted apparently :) ). I did not even realize I might need the `Key` when merging... this has helped quite a lot! – Eugene May 22 '18 at 11:56
2

There is, of course, simple and trivial trick - saving the key in the 'key mapper' function and getting the key in the 'merge' function. So, the code may look like the following (assuming the key is Integer):

final AtomicInteger key = new AtomicInteger(); 
...collect( Collectors.toMap( 
   item -> { key.set(item.getKey()); return item.getKey(); }, // key mapper 
   item -> ..., // value mapper
   (v1, v2) -> { log(key.get(), v1, v2); return v1; } // merge function
);

Note: this is not good for parallel processing.

levko
  • 21
  • 2