Calculating a group mean while excluding each cases individual value

Question

I have a dataset with 70 cases (participants in a study). Is there a function that can calculate the mean of these 70 cases such that each individual case is not included in the analysis. This would look like:

"mean for case x = (value(1) + ... value(n) - value(x))/n"

Any information will help.

Do you mean a leave one out type analysis? I think this is more of a stats question. — CCurtis, Apr 07 '14 at 00:48
It is. But I don't think SPSS has the tools to get the job done, and so I thought I would try building a function in r that can do it. — JoshExpPsych, Apr 08 '14 at 00:46

matt_k · Answer 1 · 2014-04-09T02:47:29.447

8

You could just do what you've suggested and remove each case from the total:

x <- c(1:10)
(sum(x) - x) / (length(x) - 1)

#[1] 6.000000 5.888889 5.777778 5.666667 5.555556 5.444444 5.333333 5.222222 5.111111 5.000000

mean(2:10)
#[1] 6
mean(1:9)
#[1] 5

EDIT: Updated to try to address followup question in comments:

set.seed(123)
df <- data.frame(group = rep(letters[1:3], each = 3), 
                 value = rnorm(9), stringsAsFactors = F)
df

#group       value
#1     a -0.56047565
#2     a -0.23017749
#3     a  1.55870831
#4     b  0.07050839
#5     b  0.12928774
#6     b  1.71506499
#7     c  0.46091621
#8     c -1.26506123
#9     c -0.68685285

df$loo_mean <- unlist(tapply(df$value, df$group, 
                      function(x) (sum(x) - x) / (length(x) - 1)))
df

  #group       value    loo_mean
#1     a -0.56047565  0.66426541
#2     a -0.23017749  0.49911633
#3     a  1.55870831 -0.39532657
#4     b  0.07050839  0.92217636
#5     b  0.12928774  0.89278669
#6     b  1.71506499  0.09989806
#7     c  0.46091621 -0.97595704
#8     c -1.26506123 -0.11296832
#9     c -0.68685285 -0.40207251

mean(df$value[2:3])
#[1] 0.6642654
mean(df$value[c(7,9)])
#[1] -0.1129683

edited Apr 09 '14 at 02:47

answered Apr 07 '14 at 00:52

matt_k

4,139
4
27
33

1

Clever. I was going to suggest `sapply(seq_along(x), function(i) mean(x[-i]))` but this is even better. – thelatemail Apr 07 '14 at 00:59
Great minds, hey @thelatemail? Identical, right down to the `i`. – jbaums Apr 07 '14 at 01:00
Hey, thanks! I'm computing this for a column of an array, where I want to compute the mean of one column, excluding each individual case. In this scenario, rather than the vector you used as an exemplar, what changes? – JoshExpPsych Apr 08 '14 at 01:01
Not much should change. When you say an array do you mean an R array or do you mean a matrix or data.frame? If you have a specific data structure in mind it helps to provide a small example of what your data looks like (e.g. `dput(head(data))` ). – matt_k Apr 08 '14 at 01:26
It's a dataframe with 30 variables, 30 groups, and 1305 cases. I'm looking to take the group mean while excluding each individual participant in that group, so I would need to subset by group and then create a variable for the group mean which would, of course, be different for each participant. I'm not sure what the etiquette is here but I can certainly email you a csv file so that you can take a look for yourself. – JoshExpPsych Apr 09 '14 at 02:12
1

You should try to post as much about your data and what you are actually trying to do in the initial question. I'll try to update my answer based on what you told me, but please take a look at this. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – matt_k Apr 09 '14 at 02:32
This is very helpful Matt. I'll be able to build my function with the information you've given me. – JoshExpPsych Apr 09 '14 at 05:25

score 3 · Answer 2 · answered Apr 07 '14 at 00:59

3

Here's a vectorised approach, to avoid averaging each subset one at a time:

x <- runif(70)
sapply(seq_along(x), function(i) mean(x[-i]))

answered Apr 07 '14 at 00:59

jbaums

27,115
5
79
119

I had misread @matt_k's solution and see it is of course also vectorised. – jbaums Apr 07 '14 at 01:01

Calculating a group mean while excluding each cases individual value

2 Answers2

Linked