Split hashmap to partitions in java 8

Question

I have hashmap: Map<String, Set<String>> myMap

and I want to split it to list that contains Map:

List<Map<String,Set<String>>> listofMaps;

, and each map will be max 100 keys. I know how to do it in the regular way..(foreach on the entryset , and every 100 items create new map). Is there any option to do it with java 8 lambda or something? (something like Lists.partitions() ..) ?

Not particularly. That's not the sort of operation easily (or even at all) expressible with Java 8 streams. — Louis Wasserman, Dec 21 '15 at 22:41
If you're referring to Guava's `Lists.partition`, that is a viable approach, possibly combined with Java 8, though you might use `Iterables.partition` instead. — Louis Wasserman, Dec 21 '15 at 23:10

score 7 · Accepted Answer · edited May 23 '17 at 11:46

7

Using my unorderedBatches() collector from this answer:

Collector<Entry<String, Set<String>>, ?, List<Map<String, Set<String>>>> batchesCollector = 
    unorderedBatches(100, 
        Collectors.toMap(Entry::getKey, Entry::getValue), Collectors.toList());
List<Map<String, Set<String>>> listofMaps = myMap.entrySet().stream()
        .collect(batchesCollector);

edited May 23 '17 at 11:46

Community

1
1

answered Dec 22 '15 at 01:38

Tagir Valeev

97,161
19
222
334

1

Largely more reusable than with what I came off, nice work.. again :-). – Alexis C. Dec 22 '15 at 01:47
@Tagir Valeev. I am getting some error when i use unorderedlist method. Can you please help – Sachin HR Aug 24 '21 at 10:12

Alexis C. · Answer 2 · 2015-12-21T23:59:45.840

Splitting the stream into ordered fixed size chunks (as in Lists.partition) is not possible because in a parallel execution each chunk will have to wait for its left spatial chunk to be fully processed.

If, however you don't care about the order of the keys in the resulting sub maps (as it would have been returned by the Map#iterator's method), then you can roll a custom collector.

private static <K, V> Collector<Map.Entry<K, V>, ?, List<Map<K, V>>> mapSize(int limit) {
    return Collector.of(ArrayList::new,
            (l, e) -> {
                if (l.isEmpty() || l.get(l.size() - 1).size() == limit) {
                    l.add(new HashMap<>());
                }
                l.get(l.size() - 1).put(e.getKey(), e.getValue());
            },
            (l1, l2) -> {
                if (l1.isEmpty()) {
                    return l2;
                }
                if (l2.isEmpty()) {
                    return l1;
                }
                if (l1.get(l1.size() - 1).size() < limit) {
                    Map<K, V> map = l1.get(l1.size() - 1);
                    ListIterator<Map<K, V>> mapsIte = l2.listIterator(l2.size());
                    while (mapsIte.hasPrevious() && map.size() < limit) {
                        Iterator<Map.Entry<K, V>> ite = mapsIte.previous().entrySet().iterator();
                        while (ite.hasNext() && map.size() < limit) {
                            Map.Entry<K, V> entry = ite.next();
                            map.put(entry.getKey(), entry.getValue());
                            ite.remove();
                        }
                        if (!ite.hasNext()) {
                            mapsIte.remove();
                        }
                    }
                }
                l1.addAll(l2);
                return l1;
            }
    );
}

This one takes map entries as values and put them into a List<Map<K,V>>.

The accumulator, check if the current list is empty or if the size of the last map as reached the limit. If it's the case, it add a new map. Then the new mapping from the current entry processed is added to the map.

The combiner needs to combine two lists that have been built in parallel. If one of the list is empty, return the other. If it's not the case, you need to check that the last map of the first list has the number of elements required. If it's not the case we grab the last map of the second list and we add elements to the last map of the first list. It stops if either the limit is reached or if there is no more elements to add from the second list. Don't forget to remove an empty map if all its elements have been consumed.

One usage of such a collector would be:

List<Map<String, Set<String>>> listofMaps =
                myMap.entrySet().stream().collect(mapSize(2));

Some examples (with both parallel and sequential streams) with an initial map consisting of 13 key-value mappings:

Size of maps 2
{11=[11a, 11b], 12=[12a, 12b]}
{13=[13b, 13a], 8=[8a, 8b]}
{1=[1a, 1b], 2=[2b, 2a]}
{3=[3a, 3b], 6=[6a, 6b]}
{4=[4a, 4b], 5=[5a, 5b]}
{7=[7a, 7b], 10=[10a, 10b]}
{9=[9a, 9b]}
=============================
Size of maps 5
{11=[11a, 11b], 12=[12a, 12b], 13=[13b, 13a], 6=[6a, 6b], 7=[7a, 7b]}
{1=[1a, 1b], 2=[2b, 2a], 3=[3a, 3b], 4=[4a, 4b], 5=[5a, 5b]}
{8=[8a, 8b], 9=[9a, 9b], 10=[10a, 10b]}
=============================
Size of maps 12
{11=[11a, 11b], 12=[12a, 12b], 1=[1a, 1b], 13=[13b, 13a], 2=[2b, 2a], 3=[3a, 3b], 4=[4a, 4b], 5=[5a, 5b], 6=[6a, 6b], 7=[7a, 7b], 8=[8a, 8b], 9=[9a, 9b]}
{10=[10a, 10b]}
=============================
Size of maps 15
{11=[11a, 11b], 12=[12a, 12b], 13=[13b, 13a], 1=[1a, 1b], 2=[2b, 2a], 3=[3a, 3b], 4=[4a, 4b], 5=[5a, 5b], 6=[6a, 6b], 7=[7a, 7b], 8=[8a, 8b], 9=[9a, 9b], 10=[10a, 10b]}

I've not extensively tested it. Also I think you can modify it so that it's even more generic.

For instance you might accept arbitrary objects instead and use two functions to produce a key and a value for each instance you're processing.

private static <T, K, V> Collector<T, ?, List<Map<K, V>>> mapSize(Function<T, K> keyFunc, Function<T, V> mapFunc, int limit) {

with

l.get(l.size() - 1).put(keyFunc.apply(e), mapFunc.apply(e));

and call it like:

.collect(mapSize(Map.Entry::getKey, Map.Entry::getValue, size));

Split hashmap to partitions in java 8

2 Answers2

Linked