2

My problem is somewhat related to this question.

I have a data as below

V1   V2
..   1
..   2
..   1
..   3

I need to calculate variance of data in V1 for each value of V2 cumulatively (This means that for a particular value of V2 say n,all the rows of V1 having corresponding V2 less than n need to be included.

Will ddply help in such a case?

Community
  • 1
  • 1
hardikudeshi
  • 1,441
  • 5
  • 18
  • 22

1 Answers1

4

I don't think ddply will help since it is built on the concept of taking non-overlapping subsets of a data frame.

d <- data.frame(V1=runif(1000),V2=sample(1:10,size=1000,replace=TRUE))
u <- sort(unique(d$V2))
ans <- sapply(u,function(x) {
    with(d,var(V1[V2<=x]))
})
names(ans) <- u

I don't know if there's a more efficient way to do this ...

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thank you, this has solved the problem for me. I'll wait for sometime for an alternative answer, otherwise will accept your solution! – hardikudeshi Sep 16 '12 at 14:22
  • Ben's answer is simple and to the point. Probably isn't gonna get much better. – Tyler Rinker Sep 16 '12 at 15:54
  • I think you could do something where you computed the sum of `V1` and the sum of `V1^2` for each piece, computed cumulative sums, and computed the cumulative variance from that, but it would be a little bit tricky ... – Ben Bolker Sep 16 '12 at 16:25