32

I'm looking for a function equivalent to Groovy's collate which would partition a large List into batches for processing. I did see subList which could be adapted into a similar function but wanted to check and make sure I wasn't missing an in-built or crazy simple alternative to rolling my own.

Dave Maple
  • 8,102
  • 4
  • 45
  • 64
  • 5
    This feature is proposed for a future version of Kotlin. https://github.com/Kotlin/KEEP/blob/master/proposals/stdlib/window-sliding.md – succcubbus Jul 07 '16 at 05:11

6 Answers6

53

With Kotlin 1.3, according to your needs, you may choose one of the following ways to solve your problem.


#1. Using chunked

fun main() {
    val list = listOf(2, 4, 3, 10, 8, 7, 9)
    val newList = list.chunked(2)
    //val newList = list.chunked(size = 2) // also works
    print(newList)
}

/*
prints:
[[2, 4], [3, 10], [8, 7], [9]]
*/

#2. Using windowed

fun main() {
    val list = listOf(2, 4, 3, 10, 8, 7, 9)
    val newList = list.windowed(2, 2, true)
    //val newList = list.windowed(size = 2, step = 2, partialWindows = true) // also works
    println(newList)
}

/*
prints:
[[2, 4], [3, 10], [8, 7], [9]]
*/
Imanou Petit
  • 89,880
  • 29
  • 256
  • 218
42

NOTE: For Kotlin 1.2 and newer, please see the chunked and windowed functions that are now in the standard library. There is no need for a custom solution.


Here is an implementation of a lazy batching extension function which will take a collection, or anything that can become a Sequence and return a Sequence of List each of that size, with the last one being that size or smaller.

Example usage to iterate a list as batches:

myList.asSequence().batch(5).forEach { group ->
   // receive a Sequence of size 5 (or less for final)
}

Example to convert batches of List to Set:

myList.asSequence().batch(5).map { it.toSet() }

See the first test case below for showing the output given specific input.

Code for the function Sequence<T>.batch(groupSize):

public fun <T> Sequence<T>.batch(n: Int): Sequence<List<T>> {
    return BatchingSequence(this, n)
}

private class BatchingSequence<T>(val source: Sequence<T>, val batchSize: Int) : Sequence<List<T>> {
    override fun iterator(): Iterator<List<T>> = object : AbstractIterator<List<T>>() {
        val iterate = if (batchSize > 0) source.iterator() else emptyList<T>().iterator()
        override fun computeNext() {
            if (iterate.hasNext()) setNext(iterate.asSequence().take(batchSize).toList())
            else done() 
        }
    }
}

Unit tests proving it works:

class TestGroupingStream {

    @Test fun testConvertToListOfGroupsWithoutConsumingGroup() {
        val listOfGroups = listOf(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).asSequence().batch(2).toList()
        assertEquals(5, listOfGroups.size)
        assertEquals(listOf(1,2), listOfGroups[0].toList())
        assertEquals(listOf(3,4), listOfGroups[1].toList())
        assertEquals(listOf(5,6), listOfGroups[2].toList())
        assertEquals(listOf(7,8), listOfGroups[3].toList())
        assertEquals(listOf(9,10), listOfGroups[4].toList())
    }

    @Test fun testSpecificCase() {
        val originalStream = listOf(1,2,3,4,5,6,7,8,9,10)

        val results = originalStream.asSequence().batch(3).map { group ->
            group.toList()
        }.toList()

        assertEquals(listOf(1,2,3), results[0])
        assertEquals(listOf(4,5,6), results[1])
        assertEquals(listOf(7,8,9), results[2])
        assertEquals(listOf(10), results[3])
    }


    fun testStream(testList: List<Int>, batchSize: Int, expectedGroups: Int) {
        var groupSeenCount = 0
        var itemsSeen = ArrayList<Int>()

        testList.asSequence().batch(batchSize).forEach { groupStream ->
            groupSeenCount++
            groupStream.forEach { item ->
                itemsSeen.add(item)
            }
        }

        assertEquals(testList, itemsSeen)
        assertEquals(groupSeenCount, expectedGroups)
    }

    @Test fun groupsOfExactSize() {
        testStream(listOf(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15), 5, 3)
    }

    @Test fun groupsOfOddSize() {
        testStream(listOf(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18), 5, 4)
        testStream(listOf(1,2,3,4), 3, 2)
    }

    @Test fun groupsOfLessThanBatchSize() {
        testStream(listOf(1,2,3), 5, 1)
        testStream(listOf(1), 5, 1)
    }

    @Test fun groupsOfSize1() {
        testStream(listOf(1,2,3), 1, 3)
    }

    @Test fun groupsOfSize0() {
        val testList = listOf(1,2,3)

        val groupCountZero =   testList.asSequence().batch(0).toList().size
        assertEquals(0, groupCountZero)

        val groupCountNeg =  testList.asSequence().batch(-1).toList().size
        assertEquals(0, groupCountNeg)

    }

    @Test fun emptySource() {
        listOf<Int>().asSequence().batch(1).forEach { groupStream ->
            fail()
        }

    }
}
Jayson Minard
  • 84,842
  • 38
  • 184
  • 227
  • Obviously this could be made to NOT be lazy, such as the example that maps to `it.toList()` or to work from a `Collection` instead of a sequence. But since it is easy to turn them into a `Sequence` it is a good starting point. – Jayson Minard Dec 29 '15 at 04:08
  • 2
    FYI: I think you've got a bug in your `batch` code. The following gets stuck in an endless loop `listOf(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).asSequence().batch(2).toList()`. – mfulton26 Dec 29 '15 at 13:58
  • Yeah, I see it now, if the elements aren't consumed and you convert to a list, the iterator stays still. I have to advance the iterator and create something to hold the items not consumed. – Jayson Minard Dec 29 '15 at 14:40
  • 1
    Fixed, easiest answer is to make each batch a `List` created at the time the next group is iterated. If you write code as you have `.batch(2).toList()` you will materialize all groups into memory at the same time as separate `List` otherwise only one group at a time will materialize. Code is fixed, tests updated for this case. – Jayson Minard Dec 29 '15 at 14:54
  • This is great! Thank you both for your time. Love the syntax. Might be a nice addition to the stdlib. – Dave Maple Dec 29 '15 at 16:09
  • 1
    This is a very nice solution, thanks @JaysonMinard!! – Sonofblip Apr 20 '17 at 16:08
  • 1
    Awesome answer. It´s been very useful. Thanks! – reixa Oct 05 '18 at 08:26
4

In Kotlin 1.2 M2 and later you can use chunked and windowed (see Kotlin 1.2 M2 is out | Kotlin Blog). Note that there are Sequence variances too (see kotlin.sequences - Kotlin Programming Language).

For versions of Kotlin prior to 1.2 M2 I recommend using Lists.partition(List, int) from google-guava (it uses java.util.List.subList(int, int)):

If you are unfamiliar with Guava see CollectionUtilitiesExplained · google/guava Wiki for more details.

You can create your own Kotlin extension function for it if you want:

fun <T> List<T>.collate(size: Int): List<List<T>> = Lists.partition(this, size)

If you want an extension function for mutable lists then in a separate Kotlin file (to avoid platform declaration clashes):

fun <T> MutableList<T>.collate(size: Int): List<MutableList<T>> = Lists.partition(this, size)

If you want something lazy loaded like in Jayson Minard's answer you can use Iterables.partition(Iterable, int). You might also be interested in Iterables.paddedPartition(Iterable, int) if you want to pad the last sublist if it is smaller than the specified size. These return Iterable<List<T>> (I don't see much point in making it Iterable<Iterable<T>> as subList returns an efficient view).

If for some reason you don't want to depend on Guava you can roll your own pretty easily using the subList function you mentioned:

fun <T> List<T>.collate(size: Int): List<List<T>> {
    require(size > 0)
    return if (isEmpty()) {
        emptyList()
    } else {
        (0..lastIndex / size).map {
            val fromIndex = it * size
            val toIndex = Math.min(fromIndex + size, this.size)
            subList(fromIndex, toIndex)
        }
    }
}

or

fun <T> List<T>.collate(size: Int): Sequence<List<T>> {
    require(size > 0)
    return if (isEmpty()) {
        emptySequence()
    } else {
        (0..lastIndex / size).asSequence().map {
            val fromIndex = it * size
            val toIndex = Math.min(fromIndex + size, this.size)
            subList(fromIndex, toIndex)
        }
    }
}
mfulton26
  • 29,956
  • 6
  • 64
  • 88
  • I suppose instead of separate files you can also use `@JvmName`. See [Handling signature clashes with @JvmName](https://kotlinlang.org/docs/reference/java-to-kotlin-interop.html#handling-signature-clashes-with-jvmname) for details. – mfulton26 Dec 29 '15 at 14:34
  • I like the last short answer, which works nicely for ArrayLists and a variation would work nicely for arrays. Other lists maybe less performant (i.e. LinkedList). And for other collections not possible. – Jayson Minard Dec 29 '15 at 15:13
  • 1
    Can you change your answer to protect for behavior when `size` is 0 (div by zero) or negative. – Jayson Minard Dec 29 '15 at 15:20
  • 1
    Good idea @JaysonMinard. Thanks! Added `require(size > 0)` (similar to Guava's partition methods). – mfulton26 Dec 29 '15 at 15:27
  • 1
    Great, two nice different answers + Guava if people want. – Jayson Minard Dec 29 '15 at 15:28
4

A more simplistic/functional-style solution would be

val items = (1..100).map { "foo_${it}" }

fun <T> Iterable<T>.batch(chunkSize: Int) =
   withIndex().                        // create index value pairs
   groupBy { it.index / chunkSize }.   // create grouping index
   map { it.value.map { it.value } }   // split into different partitions


items.batch(3)

Note 1: Personally I'd prefer partition as a method name here, but it's already present in Kotlin's stdlib to separate a lists into 2 parts given a predicate.

Note 2: The the iterator solution from Jayson may scale better than this solution for large collections.

Holger Brandl
  • 10,634
  • 3
  • 64
  • 63
1

Dummy Array

 for (i in 0..49){
             var  data="java"
            }
            array.add(data)

Used:

  var data=array?.chunked(15)

kotlin's method

Bipin Bharti
  • 1,034
  • 2
  • 14
  • 27
0

There is unfortunately no built-in function for that yet and while functional and Sequence-based implementations from other answers look nice, if you just need is List of Lists, I'd suggest writing a little bit of ugly, imperative, but performant code.

This is my final result:

fun <T> List<T>.batch(chunkSize: Int): List<List<T>> {
    if (chunkSize <= 0) {
        throw IllegalArgumentException("chunkSize must be greater than 0")
    }
    val capacity = (this.size + chunkSize - 1) / chunkSize
    val list = ArrayList<ArrayList<T>>(capacity)
    for (i in 0 until this.size) {
        if (i % chunkSize == 0) {
            list.add(ArrayList(chunkSize))
        }
        list.last().add(this.get(i))
    }
    return list
}
MaciejGórski
  • 22,187
  • 7
  • 70
  • 94