0

In a previous link: Calculating a group mean while excluding each cases individual value

matt_k made a clever answer to compute group means excluding the individual. He propose the following:

set.seed(123)
df <- data.frame(group = rep(letters[1:3], each = 3), 
             value = rnorm(9), stringsAsFactors = F)
df$loo_mean <- unlist(tapply(df$value, df$group, 
                 function(x) (sum(x) - x) / (length(x) - 1)))
df

But the code does not deal NA's properly, as it yield NA's for all the individuals of the group if there is a NA for the group. Can anyone solve the problem?

Community
  • 1
  • 1

2 Answers2

0

Ommit the NAs before calling the function.

na.omit(df)
Lespied
  • 322
  • 2
  • 9
  • Thanks, but I shouldn't do that, I want to compute the mean of some variables by class, doing that it would imply to ommit lots of observations it could be interesting to keep in estimates. – Jose Julian Mar 12 '17 at 10:30
  • what should an NA be counted as in your estimates? – Lespied Mar 13 '17 at 11:20
  • Suppose I want to compute the mean of a dichotomous variable indicating if the individual smoke or not. I want to test if the proportion of smokers among the rest of the class is a significant predictor of individual smoking. But at the same time, I would like to use the class mean (excluding the individual) of other variables as instrumental variables for the proportion of smokers. As the example I have posted below, this imply that only one NA in the class will imply that the class mean of the variable will have NA for all the class. – Jose Julian Mar 13 '17 at 15:19
0

Solution is simple. Add na.rm = TRUE to sum.

Compare

> sum(c(1:3, NA), na.rm = FALSE)
[1] NA

and

> sum(c(1:3, NA), na.rm = TRUE)
[1] 6

For more info, see ?sum.

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • Thanks but I wan a more elaborated code. I have a survey with more thant 30,000 students. There are lots of schools and classes, so I can't compute the mean class by class, I have tried to include na.rm=TRUE in the function and in the tapply command but with error messages. – Jose Julian Mar 12 '17 at 10:26
  • @JoseJulian please provide an example with NAs and what the expected result would look like. – Roman Luštrik Mar 12 '17 at 12:18
  • set.seed(123) df <- data.frame(group = rep(letters[1:3], each = 3), value = rpois(9,5), stringsAsFactors = F) df$value[df$value==8]<-NA df$loo_mean <- unlist(tapply(df$value, df$group, function(x) (sum(x) - x) / (length(x) - 1))) df 1 a 4 5.5 2 a 7 4.0 3 a 4 5.5 4 b NA NA 5 b 9 NA 6 b 2 NA 7 c 5 NA 8 c NA NA 9 c 5 NA # It continues – Jose Julian Mar 13 '17 at 15:03
  • I would like observations 5, 6, 7 and 9 had the group mean of the rest of the non-minssing observations of the group (this case is too simple and for observation 9, for example, there is only the seveth observation to compute the mean. – Jose Julian Mar 13 '17 at 15:11
  • @JoseJulian please edit your original question, comments are not designed for passing along data. – Roman Luštrik Mar 14 '17 at 08:48