Calculating average Based on Condition in R

Question

Referring to the question "Calculating average of based on condition", I need to calculate average of the column E based on the column F

Below is my part of data frame df but my actual data is 65K values.

        E            F        
     3.130658445    -1
     4.175605237    -1
     4.949554963    0
     4.653496112    0
     4.382672845    0
     3.870951272    0
     3.905365677    0
     3.795199341    0
     3.374740696    0
     3.104690415    0
     2.801178871    0
     2.487881321    0
     2.449349554    0
     2.405409636    0
     2.090901539    0
     1.632416356    0
     1.700583696    0
     1.846504012    0
     1.949797831    0
     1.963114449    0
     2.033100326    0
     2.014312751    0
     1.997178247    0
     2.143775497    0

Based on the solution provided in the mentioned post, below is my script.

setDT(df)[, Avg := c(rep(mean(head(d$fE, 5)), 5), rep(0, .N-5)), 
      cumsum(c(TRUE,  diff(abs(F)!=1)==1))]

But when executed I am getting the below error.

Error in rep(0, .N - 5) : invalid 'times' argument

That Error comes from trying to rep a value -n times. Try `rep(0, -1)` — Sotos, Aug 11 '17 at 12:50
I know, you should be. I am not offering a solution, I am just explainning the error :) — Sotos, Aug 11 '17 at 12:52
Instead of telling answerers that their code doesn't meet vague conditions, explicitly show desired output for the given example. For more guidance: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 Btw, your q should be self-contained instead of requiring people to read linked material. — Frank, Aug 11 '17 at 14:38

Sam · Answer 1 · 2017-08-11T12:48:10.993

1

use aggregate:

agg <- aggregate(df$E,by=list(df$F), FUN=mean)

you used a data table example, but you said data frame in your qu data table:

# this will retain all rows and return mean as a new column (per group_
df[, Mean:=mean(E), by=list(F)]
# this will return means per group only
df[, mean(E),by=.(F)]

edited Aug 11 '17 at 12:48

answered Aug 11 '17 at 12:42

Sam

1,400
13
29

Thanks!!! But the code doesnt seem to satisfy the conditions mentioned in the reference post. – ANmike Aug 11 '17 at 13:00

Orhan Yazar · Answer 2 · 2017-08-11T13:03:16.737

0

Try this : dt<-data.table(df) dt[,Avg:=mean(E),by="F"] dt <- unique(dt,by="F")

this is the result:

 `E  F      Avg
1: 3.130658 -1 3.653132
2: 4.949555  0 2.797826

Doing only this : dt<-data.table(df) dt[,Avg:=mean(E),by="F"]

You get: E F Avg 1: 3.130658 -1 3.653132 2: 4.175605 -1 3.653132 3: 4.949555 0 2.797826 4: 4.653496 0 2.797826 5: 4.382673 0 2.797826 6: 3.870951 0 2.797826 7: 3.905366 0 2.797826 8: 3.795199 0 2.797826 9: 3.374741 0 2.797826 10: 3.104690 0 2.797826 11: 2.801179 0 2.797826 12: 2.487881 0 2.797826 13: 2.449350 0 2.797826 14: 2.405410 0 2.797826 15: 2.090902 0 2.797826 16: 1.632416 0 2.797826 17: 1.700584 0 2.797826 18: 1.846504 0 2.797826 19: 1.949798 0 2.797826 20: 1.963114 0 2.797826 21: 2.033100 0 2.797826 22: 2.014313 0 2.797826 23: 1.997178 0 2.797826 24: 2.143775 0 2.797826

edited Aug 11 '17 at 13:03

answered Aug 11 '17 at 12:43

Orhan Yazar

909
7
19

Thanks for the solution, but the code doesnt seem to satisfy the conditions that are mentioned in the reference post. – ANmike Aug 11 '17 at 13:00
Try the second solution @ANmike – Orhan Yazar Aug 11 '17 at 13:08
No No, `row 2` is the last point where there is `-1` in column `F`, so average should be calculated from `row 3` till `row 7` of `column B`, and `Avg` column should be `0` after from `row 8` – ANmike Aug 11 '17 at 13:10
@ANmike Then i didn't understand what you want. Can you explain explicitly ? – Orhan Yazar Aug 11 '17 at 13:11

Calculating average Based on Condition in R

2 Answers2

Linked