4

I'm looking for the best way to partition a list (or seq) so that groups have a given size. for ex. let's say I want to group with size 2 (this could be any other number though):

let xs = [(a,b,c); (a,b,d); (y,z,y); (w,y,z); (n,y,z)]
let grouped = partitionBySize 2 input
// => [[(a,b,c);(a,b,d)]; [(y,z,y);(w,y,z)]; [(n,y,z)]]

The obvious way to implement partitionBySize would be by adding the position to every tuple in the input list so that it becomes

[(0,a,b,c), (1,a,b,d), (2,y,z,y), (3,w,y,z), (4,n,y,z)]

and then use GroupBy with

xs |> Seq.ofList |> Seq.GroupBy (function | (i,_,_,_) -> i - (i % n))

However this solution doesn't look very elegant to me. Is there a better way to implement this function (maybe with a built-in function)?

Francesco De Vittori
  • 9,100
  • 6
  • 33
  • 43
  • 2
    In your example the Group inside the List (the commented line) is modeled as a Tuple. I guess you want it to be List i.e List inside List rather than Tuple as that would not be possible with static typing – Ankur Nov 09 '11 at 11:33
  • Agree with Ankur. More even, the result you are expecting is not a valid F# value, because the last tuple is different from the rest. All items in a list should have the same type. – Gus Nov 09 '11 at 12:57
  • yup, sorry, that was what I meant, thanks for pointing out the mistake. I edited the question accordingly. – Francesco De Vittori Nov 09 '11 at 15:07

6 Answers6

9

This seems to be a repeating pattern that's not captured by any function in the F# core library. When solving similar problems earlier, I defined a function Seq.groupWhen (see F# snippets) that turns a sequence into groups. A new group is started when the predicate holds.

You could solve the problem using Seq.groupWhen similarly to Seq.group (by starting a new group at even index). Unlike with Seq.group, this is efficient, because Seq.groupWhen iterates over the input sequence just once:

[3;3;2;4;1;2;8] 
|> Seq.mapi (fun i v -> i, v) // Add indices to the values (as first tuple element)
|> Seq.groupWhen (fun (i, v) -> i%2 = 0) // Start new group after every 2nd element
|> Seq.map (Seq.map snd) // Remove indices from the values

Implementing the function directly using recursion is probably easier - the solution from John does exactly what you need - but if you wanted to see a more general approach then Seq.groupWhen may be interesting.

Tomas Petricek
  • 240,744
  • 19
  • 378
  • 553
5

List.chunkBySize (hat tip: Scott Wlaschin) is now available and does exactly what you're talking about. It appears to be new with F# 4.0.

let grouped = [1..10] |> List.chunkBySize 3
// val grouped : int list list = 
//   [[1; 2; 3]; [4; 5; 6]; [7; 8; 9]; [10]]

Seq.chunkBySize and Array.chunkBySize are also now available.

Brad Collins
  • 846
  • 9
  • 18
4

Here's a tail-recursive function that traverses the list once.

let chunksOf n items =
  let rec loop i acc items = 
    seq {
      match i, items, acc with
      //exit if chunk size is zero or input list is empty
      | _, [], [] | 0, _, [] -> ()
      //counter=0 so yield group and continue looping
      | 0, _, _::_ -> yield List.rev acc; yield! loop n [] items 
      //decrement counter, add head to group, and loop through tail
      | _, h::t, _ -> yield! loop (i-1) (h::acc) t
      //reached the end of input list, yield accumulated elements
      //handles items.Length % n <> 0
      | _, [], _ -> yield List.rev acc
    }
  loop n [] items

Usage

[1; 2; 3; 4; 5]
|> chunksOf 2 
|> Seq.toList //[[1; 2]; [3; 4]; [5]]

I like the elegance of Tomas' approach, but I benchmarked both our functions using an input list of 10 million elements. This one clocked in at 9 secs vs 22 for his. Of course, as he admitted, the most efficient method would probably involve arrays/loops.

Daniel
  • 47,404
  • 11
  • 101
  • 179
3

What about a recursive approach? - only requires a single pass

let rec partitionBySize length inp dummy = 
    match inp with
    |h::t ->
        if dummy |> List.length < length then
            partitionBySize length t (h::dummy)
        else dummy::(partitionBySize length t (h::[]))
    |[] -> dummy::[]

Then invoke it with partitionBySize 2 xs []

John Palmer
  • 25,356
  • 3
  • 48
  • 67
2
let partitionBySize size xs =
  let sq = ref (seq xs)
  seq {
    while (Seq.length !sq >= size) do
      yield Seq.take size !sq
      sq := Seq.skip size !sq
    if not (Seq.isEmpty !sq) then yield !sq
  }
  // result to list, if you want
  |> Seq.map (Seq.toList)
  |> Seq.toList

UPDATE

let partitionBySize size (sq:seq<_>) =
  seq {
    let e = sq.GetEnumerator()
    let empty = ref true;
    while !empty do
      yield seq { for i = 1 to size do
                    empty := e.MoveNext()
                    if !empty then yield e.Current
            }
  }

array slice version:

let partitionBySize size xs =
  let xa = Array.ofList xs
  let len = xa.Length
  [
    for i in 0..size..(len-1) do
      yield ( if i + size >= len then xa.[i..] else xa.[i..(i+size-1)] ) |> Array.toList 
  ]
BLUEPIXY
  • 39,699
  • 7
  • 33
  • 70
  • I don't really understand why, but this is quite slow with my large dataSet (much slower than Tomas groupWhen) – Francesco De Vittori Nov 09 '11 at 16:15
  • 2
    This is slow because `Seq.skip` is inefficient, especially when you call it on a sequence that is already a result of `Seq.skip`. The returning sequence iterates over the original from the beginning, each time it is accessed. – Tomas Petricek Nov 09 '11 at 16:47
1

Well, I was late for the party. The code below is a tail-recursive version using high-order functions on List:

let partitionBySize size xs =
    let i = size - (List.length xs - 1) % size
    let xss, _, _ =
        List.foldBack( fun x (acc, ls, j) -> 
                                          if j = size then ((x::ls)::acc, [], 1) 
                                          else (acc, x::ls, j+1)
                       ) xs ([], [], i)
    xss

I did the same benchmark as Daniel did. This function is efficient while it is 2x faster than his approach on my machine. I also compared it with an array/loop version, they are comparable in terms of performance.

Moreover, unlike John's answer, this version preserves order of elements in inner lists.

pad
  • 41,040
  • 7
  • 92
  • 166
  • I'd like to see your benchmark. In my test of 10M elements, mine is still a second faster. – Daniel Nov 10 '11 at 17:56
  • I'm getting different results for initial and subsequent iterations. From a cold start, mine is faster. On subsequent runs yours is faster (by about 3 secs--still not 2x). Not sure what to attribute it to. This is in FSI. – Daniel Nov 10 '11 at 17:59
  • Any idea why this function drops from 10 sec on a fresh FSI session to ~6 sec on subsequent runs? My function remains pretty much constant (~9 sec). Just wondering what could be different. – Daniel Nov 10 '11 at 18:02
  • The results I got are from the benchmark with 1M elements. Don't know why there are some gaps between cold start and subsequent runs. Maybe there are some tricks behind implementation of List.foldBack? – pad Nov 10 '11 at 18:18
  • Using 1M vs my 10M explains the discrepancy. The gap will narrow with larger inputs, and eventually tilt in my favor, since yours returns a list (the results are accumulated in memory) and mine returns a sequence (giving constant memory usage). – Daniel Nov 10 '11 at 18:53
  • it will be difficult to choose a single answer. Excellent input by everybody! – Francesco De Vittori Nov 12 '11 at 15:03