Variable calculations using rows that satisfy a condition

Question

I'm trying to work out a mean of a variable using rows that are equal to another value using:

pp$mmean[pp[,1] == '1'] <- mean(pp$mm)[1:nrow(pp[,1] == '1')]

That is I'm trying to work out the mean of mm - using rows where the first column == 1 (excluding every other row if it doesn't equal 1) where the pp$mmean result will only be indicated next to these rows. The above code gives me:

Error in 1:nrow(pp[, 1] == "1") : argument of length 0

I want to do this multiple times for every unique value in pp[,1]... and will set up a for loop for this.

Not sure what I'm doing wrong here...

Example of data, pp:

Plan X mm
1 95 0.323    
1 275 0.341818    
1 2 0.618   
1 75 0.32     
1 13 0.399    
1 20 0.40     
2 219 0.393    
2 50 0.060 
2 213 0.39    
2 204 0.4961     
2 19 0.393    
2 201 0.388

etc...

About Your error - when You are subsetting one column from a data.frame like this `pp[,1]` the result coerces to a vector, so it has no dimensions, so `nrow` returns `NULL` and `1:nrow(...)` throws an error. `length` would work correctly, or if You want to retain data.frame structure while subsetting one column You should use `drop` argument, like this `pp[, 1, drop = FALSE]`. — BartekCh, Apr 07 '14 at 07:47

Henrik · Accepted Answer · 2014-04-06T11:21:46.263

You may try ave. With the default arguments, ave calculates mean for each level of the grouping variable(s), but the resulting vector has the same length as the original data.

pp$mean_mm <- with(pp, ave(mm, Plan))

#    Plan   X       mm  mean_mm
# 1     1  95 0.323000 0.400303
# 2     1 275 0.341818 0.400303
# 3     1   2 0.618000 0.400303
# 4     1  75 0.320000 0.400303
# 5     1  13 0.399000 0.400303
# 6     1  20 0.400000 0.400303
# 7     2 219 0.393000 0.353350
# 8     2  50 0.060000 0.353350
# 9     2 213 0.390000 0.353350
# 10    2 204 0.496100 0.353350
# 11    2  19 0.393000 0.353350
# 12    2 201 0.388000 0.353350

Edit following comment; ave over multiple columns. One possibility is to loop over columns on which mean should be calculated using sapply.

# sample data
pp <- data.frame(Plan = rep(letters[1:3], each = 3), mm = 1:9, mm1 = 2:10, mm2 = 3:11)

# name of variables for which mean should be calculated 
vars <- c("mm", "mm1", "mm2")

# 'loop' over variables using sapply
m <- sapply(vars, function(x){
  pp2 <- pp[ , c("Plan", x)]
  ave(pp2[ , x], pp2[ , "Plan"])
  })

# rename columns of result matrix
colnames(m) <- paste0("mean_", vars)

# add means to original data
cbind(pp, m)

Just to add: say that I had multiple mm columns, say mm, mm1 and mm2 (in columns V3 to V5) - can one use: pp$mean_mm <- with(pp, ave(pp[,3:5], Plan)) to get the mean of all these columns in each Plan? It didn't seem to work for me (I get 40 warnings) — user2726449, Apr 06 '14 at 03:25

score 1 · Answer 2 · edited May 23 '17 at 11:49

1

Many built-in options:

by(pp$mm, pp$X, mean, na.rm=T) tapply(pp$mm, pp$X, mean, na.rm=T)

using plyr:

library(plyr)
ddply( pp, .(X), mean)

using data.table:

library(data.table)
pp = data.table(pp)
pp[,mean(mm,na.rm=T),by="X"]

if you want to set it directly in your data.table:

pp[,AVERAGEbyX:=mean(mm,na.rm=T),by="X"]

not to mention mapply and aggregate

Here is an overview of the R built-in options: Using tapply for the subset group of data

edited May 23 '17 at 11:49

Community

1
1

answered Apr 05 '14 at 18:20

crogg01

2,446
15
35

They are helpful suggestions. However I get the error when running these `Error in $<-.data.frame(*tmp*, mmean", value = c(0.400303, : replacement has 6 rows, data has 87 ` – user2726449 Apr 05 '14 at 18:34
These don't put them straight in your data.frame, they just give the average by X. To put it into your frame you would use `data.table`'s `:=` notation (`pp[,AVERAGEbyX:=mean(mm,na.rm=T),by="X"]`) or Henrik's `with` call. – crogg01 Apr 05 '14 at 18:36

Variable calculations using rows that satisfy a condition

2 Answers2