0

I have a some Seq<'T> and I need a function:

Seq.block : int -> Seq<'T> -> Seq<Seq<'T>>

where the resulting sequence is the original sequence, split up in blocks of fixed length.

So e.g. Seq.block 2 {1;2;3;4;5} would have to return {{1;2};{3;4};{5}}

The catch is that this will have to work both for finite and infinite sequences. I have various undertaken various attempts and the last one is working, but I'm hoping this can be done cleaner. I've left my various non-working attempts in here in hope that that might be of help to somebody too.

My first attempt was:

let block size = Seq.mapi (fun i x -> (i/size,x))
                 >> Seq.groupBy fst
                 >> Seq.map (fun (_,res) -> Seq.map snd res)

This works fine for finite sequences, but not for infinite sequences (the groupBy doesn't work lazy, which in retrospect is kind of obvious).

As a second attempt I tried (hoping that fold would operate lazily)

let block size =  Seq.mapi (fun i x -> (i/size,x))
                  >> Seq.fold (fun (accIx,fullSeq,buildSeq) (ix,el) ->
                                        if accIx = ix then (accIx,fullSeq, Seq.append buildSeq (Seq.singleton el))
                                        else (accIx+1,Seq.append fullSeq (seq {yield buildSeq}),Seq.singleton el))
                                  (0, Seq.empty,Seq.empty)
                  >> fun (_,full,rem) -> Seq.append full (seq { yield rem})

but that is even a lot worse (throws a stackoverflow on an test example where the first attempt worked, though my version has probably lost of room for improvement).

For attempt #3 I did:

let block size x = Seq.initInfinite (fun i -> x |> Seq.skip (i*size)
                                                |> Seq.take size)

This actually works on an infinite sequence, but a. it doesn't work on a finite sequence and b. probably suffers from performance problems as you get further out in the tail of the sequence. The recursive attempt #4 I already pretty much knew was doomed to fail, but I figured I'd give it a try anyway:

let rec block size x = 
    Seq.append (seq { yield Seq.take size x})
               (block size (Seq.skip size x))

Attempt #5 almost works:

let block size = Seq.windowed size
                 >> Seq.mapi (fun i x -> (i,x))
                 >> Seq.filter (fun (i,_) -> i % size = 0)
                 >> Seq.map (fun (_,x) -> Array.toSeq x)

but has the (slight) drawback that for a finite sequence, when the sequence length is not divisible by 'size', the short last element is not in the list. Next to this, I'm slightly worried over performance because of all the arrays being generated.

Mulling over this a little longer, I came up with Attempt #6:

let block size = Seq.mapi (fun i x -> (i % size,x))
                 >> Seq.unfold (fun state -> if Seq.isEmpty state then None
                                             else
                                                 match Seq.tryFind (fun (i,_) -> i = size-1) state with
                                                        | None  -> state |> Seq.map snd,
                                                                   Seq.empty
                                                        | _     -> state |> Seq.take size
                                                                         |> Seq.map snd,
                                                                   Seq.skip size state
                                                    |> Some)

Happy to hear if there are any better alternatives.

Bram
  • 618
  • 4
  • 14
  • cited `chunks` answer seem right - didnt follow the related: links at top of question. I searched for 'LINQ batch inputs' then 'f# batch`. Can you close please (even though your question and answer catalog is impressive, it is still a dup :() – Ruben Bartelink Dec 01 '13 at 07:38
  • see also http://stackoverflow.com/a/6737059/11635 – Ruben Bartelink Dec 01 '13 at 07:42
  • 1
    and http://stackoverflow.com/a/9855620/11635 – Ruben Bartelink Dec 01 '13 at 07:46
  • also see the "chunking" operation in Deedle: http://bluemountaincapital.github.io/Deedle/timeseries.html#windowing – Tomas Petricek Dec 01 '13 at 18:43
  • Thanks for the links, I agree it's a dupe. For reference: most of the links given give solutions that will not work for infinite sequences (though as mentioned, the "chunks" function in the link on top does). – Bram Dec 01 '13 at 20:57

0 Answers0