I am trying to get the standard deviation for one column in a data frame, grouped by several other columns.
x <- c("Paul", "Paul", "Paul", "Jennifer", "Jennifer", "Jennifer")
y <- c("a", "a", "b", "c", "c", "d")
g <- c("eins", "eins", "zwei", "drei", "drei", "vier")
z <- c(1,2,3,4,5,6)
df <- tibble(Fall = x, DRG = y, DRG2 = g, Anzahl = z)
df$Fall <- as.factor(df$Fall)
df$DRG <- as.factor(df$DRG)
df$DRG2 <- as.factor(df$DRG2)
This is the tibble:
df
# A tibble: 6 x 4
Fall DRG DRG2 Anzahl
<fct> <fct> <fct> <dbl>
1 Paul a eins 1
2 Paul a eins 2
3 Paul b zwei 3
4 Jennifer c drei 4
5 Jennifer c drei 5
6 Jennifer d vier 6
Calculating the mean works:
aggregate(x = df,
by = list(df$Fall, df$DRG, df$DRG2),
FUN = mean, na.rm = TRUE)
Group.1 Group.2 Group.3 Fall DRG DRG2 Anzahl
1 Jennifer c drei NA NA NA 4.5
2 Paul a eins NA NA NA 1.5
3 Jennifer d vier NA NA NA 6.0
4 Paul b zwei NA NA NA 3.0
Standard deviation gives me an error:
aggregate(x = df,
by = list(df$Fall, df$DRG, df$DRG2),
FUN = sd, na.rm = TRUE)
Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
Calling var(x) on a factor x is defunct.
Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
Why is that? I tried to understand the error message but i don't understand why it works with mean but not with standard deviation. If i turn all the factors to characters, then standard deviation works and gives me correct result. Why is that?
Regards