1

I have the following data:

let data = [(41609.00 , 10000., 3.822); (41609.00, 60000., 3.857); (41974.00 , 20000., 4.723 ); (41974.00, 30000., 3.22 ); (41974.00 , 4000., 4.655 ); (42339.00, 7000., 4.22 ); (42339.00 , 5000., 3.33)]

fist column = OADate, 2nd = volume, third = price.

I now want to group by date, sum the volume and compute the weighted average price. This is what I have so far:

let aggr data = 
    data
    //Multiply second and third column element by element
    |> Seq.map (fun (a, b, c) -> (a, b, b * c))
    //Group by first column
    |> Seq.groupBy fst
    //Sum column 2 & 3 based on group of column 1
    |> Seq.map (fun (d, e, f) -> (d, e |> Seq.sum, f |> Seq.sum)) 
    //take the sum and grouped column 1 & 2 and compute weighted average of the third
    |> Seq.map (fun (g, h, i) -> (g, h, i/h)) 

I m getting a type mismatch that tuples have differing lengths. I have used similar syntax before without issues. Could anyone please point me in the right direction?

UPDATE:

In case somebody is interested the solution is: THANKS to Tomas and Leaf

let aggr data = 
data
|> Seq.map (fun (a, b, c) -> (a, b, b * c))
|> Seq.groupBy (fun (a, b, c) -> a)
|> Seq.map (fun (key, group) -> group |> Seq.reduce (fun (a, b, c) (x, y, z) -> a, b+y , c+z))
|> Seq.map (fun (g, h, i) -> (g, h, i/h)) 
nik
  • 1,672
  • 2
  • 17
  • 36

1 Answers1

4

The first problem in your code is that you are calling Seq.groupBy with fst as the argument. This does not work because fst is a function that returns first element of two-element tuple, but your input is a three-element tuple. Sadly, the function does not work for any tuple. You need to write a lambda that selects the first value out of three:

(...)
|> Seq.groupBy (fun (a, b, c) -> a)

The next problem is the mapping in the next step. The grouping produces a list of tuples containing the key (time) as the first element and a group containing a list of elements from the original input sequence (three-element tuples in your case). To return the key together with the sum of all the second component in a group, you can write:

(...)
|> Seq.map (fun (key, group) -> key, group |> Seq.sumBy (fun (_, v, _) -> v))

I'm not entirely sure what you want to do with the second and third columns, but this should give you an idea how to continue.

Tomas Petricek
  • 240,744
  • 19
  • 378
  • 553
  • Aside, I've been working on a library for time-series and data-frame manipulations which would likely make this easier. Check out https://github.com/BlueMountainCapital/FSharp.DataFrame and http://bluemountaincapital.github.io/FSharp.DataFrame/ if you are interested. – Tomas Petricek Oct 07 '13 at 14:59
  • thanks Tomas. Does that mean that if I want to sum the snd and trd column by hte key I need to do this in two lines? Also there seems to be a problem with the last Seq.map as well. Any idea? – nik Oct 07 '13 at 15:03
  • You can always return a tuple with multiple keys - in the second snippet, I just returned key & sum of the group, but you can extend that and returns other things (sum of the third column?) I think that it should work once you do that. – Tomas Petricek Oct 07 '13 at 15:06
  • sorry but I dont get it. If I want to sum 2nd and 3rd, this does not work: |> Seq.map (fun (key, group) -> key, group |> Seq.sumBy (fun (_, v, w) -> (v, w))) – nik Oct 07 '13 at 15:09
  • 1
    You can't use `Seq.sumBy` like that because tuples cannot be summed. You could use `Seq.reduce` instead: `|> Seq.map (fun (key, group) -> group |> Seq.reduce (fun (a,b,c) (x,y,z) -> a,b+y,c+z))` – Leaf Garland Oct 07 '13 at 15:23