3

It is easy to calculate the mean of each group in an R dataframe. If you want to exclude the current observation, it is almost as easy.

Is there any easy way to exclude the current observation when calculating the standard deviation?

For example, when I have this table

data.frame(country = c(rep("A",3), rep("B",3)), weight = c(10,11,12,20,25,30))

, I need the following table:

data.frame(country = c(rep("A",3), rep("B",3)), weight = c(10,11,12,20,25,30), standarddeviation = c(sd(c(11,12)), sd(c(10,12)), sd(c(10,11)), sd(c(25,30)), sd(c(20,30)), sd(c(20,25))))
wwl
  • 2,025
  • 2
  • 30
  • 51

2 Answers2

5

An option is to use dplyr and mapply. mapply runs for every row (of group) and sd calculation excludes the current row.

library(dplyr)

df %>% group_by(country) %>%
  mutate(Sp_SD = mapply(function(x)sd(weight[-x]), 1:n()))


# # A tibble: 6 x 3
# # Groups: country [2]
# country weight Sp_SD
# <fctr>   <dbl> <dbl>
# 1 A         10.0 0.707
# 2 A         11.0 1.41 
# 3 A         12.0 0.707
# 4 B         20.0 3.54 
# 5 B         25.0 7.07 
# 6 B         30.0 3.54 
MKR
  • 19,739
  • 4
  • 23
  • 33
0

Not a very beautiful solution, but it should work

library(dplyr)

data = data.frame(country = c(rep("A",3), rep("B",3)), weight = c(10,11,12,20,25,30))

cdata = list()

for(k in 1:length(unique(data$country))){
cdata[[k]] = filter(data,country==unique(country)[k])
}

for(i in 1:length(unique(data$country))){
  for(j in 1:nrow(cdata[[1]])){
    aux=cdata[[i]][-j,]
    cdata[[i]][j,"StandardDeviation"] = sd(aux$weight)
  }
}

rbind(cdata[[1]],cdata[[2]])
Fino
  • 1,774
  • 11
  • 21