3

I have a dataset with 70 cases (participants in a study). Is there a function that can calculate the mean of these 70 cases such that each individual case is not included in the analysis. This would look like:

"mean for case x = (value(1) + ... value(n) - value(x))/n"

Any information will help.

JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
  • Do you mean a leave one out type analysis? I think this is more of a stats question. – CCurtis Apr 07 '14 at 00:48
  • It is. But I don't think SPSS has the tools to get the job done, and so I thought I would try building a function in r that can do it. – JoshExpPsych Apr 08 '14 at 00:46

2 Answers2

8

You could just do what you've suggested and remove each case from the total:

x <- c(1:10)
(sum(x) - x) / (length(x) - 1)

#[1] 6.000000 5.888889 5.777778 5.666667 5.555556 5.444444 5.333333 5.222222 5.111111 5.000000

mean(2:10)
#[1] 6
mean(1:9)
#[1] 5

EDIT: Updated to try to address followup question in comments:

set.seed(123)
df <- data.frame(group = rep(letters[1:3], each = 3), 
                 value = rnorm(9), stringsAsFactors = F)
df

#group       value
#1     a -0.56047565
#2     a -0.23017749
#3     a  1.55870831
#4     b  0.07050839
#5     b  0.12928774
#6     b  1.71506499
#7     c  0.46091621
#8     c -1.26506123
#9     c -0.68685285

df$loo_mean <- unlist(tapply(df$value, df$group, 
                      function(x) (sum(x) - x) / (length(x) - 1)))
df

  #group       value    loo_mean
#1     a -0.56047565  0.66426541
#2     a -0.23017749  0.49911633
#3     a  1.55870831 -0.39532657
#4     b  0.07050839  0.92217636
#5     b  0.12928774  0.89278669
#6     b  1.71506499  0.09989806
#7     c  0.46091621 -0.97595704
#8     c -1.26506123 -0.11296832
#9     c -0.68685285 -0.40207251

mean(df$value[2:3])
#[1] 0.6642654
mean(df$value[c(7,9)])
#[1] -0.1129683
matt_k
  • 4,139
  • 4
  • 27
  • 33
  • 1
    Clever. I was going to suggest `sapply(seq_along(x), function(i) mean(x[-i]))` but this is even better. – thelatemail Apr 07 '14 at 00:59
  • Great minds, hey @thelatemail? Identical, right down to the `i`. – jbaums Apr 07 '14 at 01:00
  • Hey, thanks! I'm computing this for a column of an array, where I want to compute the mean of one column, excluding each individual case. In this scenario, rather than the vector you used as an exemplar, what changes? – JoshExpPsych Apr 08 '14 at 01:01
  • Not much should change. When you say an array do you mean an R array or do you mean a matrix or data.frame? If you have a specific data structure in mind it helps to provide a small example of what your data looks like (e.g. `dput(head(data))` ). – matt_k Apr 08 '14 at 01:26
  • It's a dataframe with 30 variables, 30 groups, and 1305 cases. I'm looking to take the group mean while excluding each individual participant in that group, so I would need to subset by group and then create a variable for the group mean which would, of course, be different for each participant. I'm not sure what the etiquette is here but I can certainly email you a csv file so that you can take a look for yourself. – JoshExpPsych Apr 09 '14 at 02:12
  • 1
    You should try to post as much about your data and what you are actually trying to do in the initial question. I'll try to update my answer based on what you told me, but please take a look at this. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – matt_k Apr 09 '14 at 02:32
  • This is very helpful Matt. I'll be able to build my function with the information you've given me. – JoshExpPsych Apr 09 '14 at 05:25
3

Here's a vectorised approach, to avoid averaging each subset one at a time:

x <- runif(70)
sapply(seq_along(x), function(i) mean(x[-i]))
jbaums
  • 27,115
  • 5
  • 79
  • 119