136

How to split a List of elements into lists with at most N items?

ex: Given a list with 7 elements, create groups of 4, leaving the last group possibly with less elements.

split(List(1,2,3,4,5,6,"seven"),4)

=> List(List(1,2,3,4), List(5,6,"seven"))
Johnny Everson
  • 8,343
  • 7
  • 39
  • 75

5 Answers5

248

I think you're looking for grouped. It returns an iterator, but you can convert the result to a list,

scala> List(1,2,3,4,5,6,"seven").grouped(4).toList
res0: List[List[Any]] = List(List(1, 2, 3, 4), List(5, 6, seven))
Kipton Barros
  • 21,002
  • 4
  • 67
  • 80
  • 37
    Scala lists have something for everything. – J Atkin Sep 26 '15 at 21:37
  • I have a weird question. For the same case if i convert the data to a sequence, I get a Stream Object. Why is that? – Rakshith Jun 23 '16 at 10:26
  • 3
    @Rakshith That sounds like a separate question. Scala has a mysterious gnome that chooses a data structure, and it chose a Stream for you. If you want a List, you should request a List, but you can also just trust the gnome's judgment. – Ion Freeman Jun 21 '17 at 21:09
17

There is much easier way to do the task using sliding method. It works this way:

val numbers = List(1, 2, 3, 4, 5, 6 ,7)

Lets say you want to break the list into smaller lists of size 3.

numbers.sliding(3, 3).toList

will give you

List(List(1, 2, 3), List(4, 5, 6), List(7))
Paul Roub
  • 36,322
  • 27
  • 84
  • 93
Dorjee
  • 1,533
  • 1
  • 12
  • 9
9

Or if you want to make your own:

def split[A](xs: List[A], n: Int): List[List[A]] = {
  if (xs.size <= n) xs :: Nil
  else (xs take n) :: split(xs drop n, n)
}

Use:

scala> split(List(1,2,3,4,5,6,"seven"), 4)
res15: List[List[Any]] = List(List(1, 2, 3, 4), List(5, 6, seven))

edit: upon reviewing this 2 years later, I wouldn't recommend this implementation since size is O(n), and hence this method is O(n^2), which would explain why the built-in method becomes faster for large lists, as noted in comments below. You could implement efficiently as follows:

def split[A](xs: List[A], n: Int): List[List[A]] =
  if (xs.isEmpty) Nil 
  else (xs take n) :: split(xs drop n, n)

or even (slightly) more efficiently using splitAt:

def split[A](xs: List[A], n: Int): List[List[A]] =
  if (xs.isEmpty) Nil 
  else {
    val (ys, zs) = xs.splitAt(n)   
    ys :: split(zs, n)
  }
Luigi Plinge
  • 50,650
  • 20
  • 113
  • 180
  • 4
    `xs splitAt n` is an alternative to the combination `xs take n` and `xs drop n` – Kipton Barros Sep 18 '11 at 06:05
  • 1
    this will explode the stack, consider a recursive implementation – Jed Wesley-Smith Sep 18 '11 at 21:26
  • @Kipton, true, but you need to extract the results to temporary vals so it adds a couple of lines to a method. I did a quick benchmark and it seems using `splitAt` instead of `take`/`drop`improves performance on average around 4%; both are 700-1000% quicker than `.grouped(n).toList`! – Luigi Plinge Sep 18 '11 at 21:29
  • @Luigi, Wow. Any thoughts about why `grouped-toList` is so slow? That sounds like a bug. – Kipton Barros Sep 18 '11 at 21:33
  • @Jed You're right in extreme cases, but your implementation depends on what you're using it for. For OP's use-case (if `grouped` didn't exist :)), simplicity is the overriding factor. For the standard library, stability and performance should trump elegance. But there are plenty of examples both in *Programming in Scala* and the standard libraries of normal-recursive (rather than tail-recursive) calls; it's a standard and important weapon in the FP toolbox. – Luigi Plinge Sep 18 '11 at 21:39
  • @Kipton Not really; I looked briefly at the source which shows that `grouped` creates something called a `GroupedIterator` that it iterates over within a for-expression; maybe the VM is less able to optimize this than a simple recursive function, but really I have no idea. I imagine there is lots of scope for performance measurement and tuning within the collections library! – Luigi Plinge Sep 18 '11 at 22:03
  • @Kipton a bit more investigation shows that actaully the built-in `grouped` is much quicker for larger lists (splitting into hundreds of groups). The crossover seems to come when lists reach around 200 in size. No bug! – Luigi Plinge Sep 18 '11 at 22:40
  • Interesting discussion guys. Thanks for the answer. Right now I am using grouped and it's fine. – Johnny Everson Sep 19 '11 at 02:29
  • You don't actually need the check `if (zs.isEmpty)`. It's already covered by your first case. So you could just write `ys :: split(zs, n)` – Konstantin Milyutin Nov 20 '13 at 15:43
  • Now compare your first solution and the updated one :) You removed too much: `splitAt` was good. Now you call `take` and `drop` again. Sorry for picking. – Konstantin Milyutin Nov 21 '13 at 10:01
  • @damluar I've now shown the `splitAt` version too, although I'm not convinced that this is better. As mentioned above, the difference between using that and `take` + `drop` is a lot smaller than you might think: `splitAt`, `take`, and `drop` all being O(n), so we're just talking constant factors... `splitAt` and `take` are basically the same, and the contstant factor of `drop` is tiny in comparison (just dereferencing a variable). – Luigi Plinge Nov 21 '13 at 16:41
  • I agree with you about constant factor. That's why I didn't understand you updated code, why do you think it's much more efficient. In my opinion, there is not much difference from your first code. – Konstantin Milyutin Nov 21 '13 at 16:51
  • @damluar For a given `n`, `take`, `splitAt` and `drop` take a constant time no matter how big `xs` is. Whereas `size` takes time proportional to the length of `xs`. Since these are performed for each element, execution time increases quadratically if you use `size`. We remove this problem by using `isEmpty`, which is constant time. – Luigi Plinge Nov 21 '13 at 19:19
  • It is not an extreme case, it will happen for sure at some point in time. You have to do it in a tail recursive fashion using an accumulator perhaps – Filippo De Luca Dec 04 '14 at 20:44
5

I am adding a tail recursive version of the split method since there was some discussion of tail-recursion versus recursion. I have used the tailrec annotation to force the compiler to complain in case the implementation is not indeed tail-recusive. Tail-recursion I believe turns into a loop under the hood and thus will not cause problems even for a large list as the stack will not grow indefinitely.

import scala.annotation.tailrec


object ListSplitter {

  def split[A](xs: List[A], n: Int): List[List[A]] = {
    @tailrec
    def splitInner[A](res: List[List[A]], lst: List[A], n: Int) : List[List[A]] = {
      if(lst.isEmpty) res
      else {
        val headList: List[A] = lst.take(n)
        val tailList : List[A]= lst.drop(n)
        splitInner(headList :: res, tailList, n)
      }
    }

    splitInner(Nil, xs, n).reverse
  }

}

object ListSplitterTest extends App {
  val res = ListSplitter.split(List(1,2,3,4,5,6,7), 2)
  println(res)
}
ches
  • 6,382
  • 2
  • 35
  • 32
Mike
  • 137
  • 1
  • 7
  • 1
    This answer could be improved by adding some explanation. Given that the accepted answer seems to be the canonical, intended way to do this, you should explain why someone would prefer this answer. – Jeffrey Bosboom Feb 14 '16 at 06:25
1

I think this is the implementation using splitAt instead of take/drop

def split [X] (n:Int, xs:List[X]) : List[List[X]] =
    if (xs.size <= n) xs :: Nil
    else   (xs.splitAt(n)._1) :: split(n,xs.splitAt(n)._2)
Hydrosan
  • 95
  • 2
  • 6