Text Parsing and Nested Collection Transposition in F#

Question

I parse data from a csv file that looks like this:

X,..,..,Dx,..,..
Y,..,..,Dy,..,..
X,..,..,Dx,..,..
Y,..,..,Dy,..,..
X,..,..,Dx,..,..
Y,..,..,Dy,..,..

Each row is an element of an array of a type I defined and used with FileHelpers. This probably isn't relevant, but I'm including this incase someone knows a trick I could do at this stage of the process using FileHelpers.

I'm only interested in pairs X,Dx and Y,Dy The data could have more than just X & Y eg.. (X,Dx); (Y,Dy); (Z,Dz); ...

I'll call the number of letters nL

The goal is to get the averages of Dx, Dy, ... for each group by processing an array of all D's which has SUM(nIterations) * nL elements.

I have a list of numbers of iterations:

let nIterations = [2000; 2000; 2000; 1000; 500; 400; 400; 400; 300; 300]

And for each of these numbers, I will have that many "letter groups." So the rows of data of interest for nIterations.[0], are rows 0 to (nIterations.[0] * nL)

To get the rows of interest for nIterations.[i], I make a list "nis" which is the result of a scan operation performed on nIterations.

let nis = List.scan (fun x e -> x + e) 0 nIterations

Then to isolate the nItertions.[i] group ..

let group = Array.sub Ds (nis.[i]*nL) (nIterations.[i]*nL)

Here's the whole thing:

nIterations |> List.mapi (fun i ni ->
    let igroup = Array.sub Ds (nis.[i]*nL) (ni*nL)
    let groupedbyLetter = (chunk nL igroup)

    let sums = seq { for idx in 0..(nL - 1) do
                         let d = seq { for g in groupedbyLetter do 
                                           yield (Seq.head (Seq.skip idx g)) }
                         yield d |> Seq.sum }

    sums |> Seq.map (fun x -> (x / (float ni))) ) |> List.ofSeq

That "chunk" function is one I found on SO:

let rec chunk n xs =
    if Seq.isEmpty xs then Seq.empty
    else
        let (ys,zs) = splitAt n xs
        Seq.append (Seq.singleton ys) (chunk n zs)

I have verified this works, and gets me what I want - a size nL collection of size nIterations.Length collections.

The problem is speed - this only works on small data sets; the sizes I'm working with in the example I've given are too big. It gets "hung" at the chunk function.

So my question is: How do I go about improving the speed of this whole process? (and/or) What is the best (or atleast a better) way to do that "transposition"

I figure I could:

try to rearrange the data as I'm reading it in
try to index the elements directly
try breaking the process into smaller stages or "passes"
???

Have you tried using the csv type provider? http://fsharp.github.io/FSharp.Data/library/CsvProvider.html — N_A, Aug 20 '13 at 16:57
Can you provide an example of concrete input that you have and a concrete output that you'd like to get? I'm getting a bit lost :-). Also, it is a bit hard to answer without a runnable source code - if you could share a version that runs somewhere, that would be useful. — Tomas Petricek, Aug 20 '13 at 18:30

score 1 · Accepted Answer · answered Aug 20 '13 at 19:51

I got it.

let averages =
    (nIterations |> List.mapi (fun i ni ->
        let igroup = Array.sub Ds (nis.[i]*nL) (ni*nL)
        let groupedbyLetter = 
            [| for a in 1..nL..igroup.Length do 
                   yield igroup.[(a - 1)..(a - 1)+(nL-1)] |]

        [| for i in 0..(nL - 1) do
               yield [| for j in 0..(groupedbyLetter.Length - 1) do
                            yield groupedbyLetter.[j].[i] |] 
               |> Array.average |]) )

let columns = [| for i in 0..(nL - 1) do
                     yield [| for j in 0..(nIterations.Length - 1) do
                                  yield averages.[j].[i] |] 
                     |]

The "columns" function is just transposing the data again so I can easily print..

               ----Average Ds----
nIterations       X    Y    Z
   2000          0.2  0.7  1.2
    ...          ...  ...  ...
    ...          ...  ...  ...

e.g. averages returns

[[x1,y1,z1,..], [x2,y2,z2,..], ... ]

and columns gives me

[ [x1,x2,..], [y1,y2,..], [z1,z2,..], ...]

Text Parsing and Nested Collection Transposition in F#

1 Answers1