I have the following toy data:
data <- structure(list(value = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L), class = structure(c(1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("A",
"B"), class = "factor")), .Names = c("value", "class"), class = "data.frame", row.names = c(NA,
-16L))
Using the commands:
data <- table(data$class, data$value)
data <- as.data.frame(data)
data$rel_freq <- data$Freq / aggregate(Freq ~ Var1, FUN = sum, data = data)$Freq
I calculate appropriate relative frequencies for each value in each of the classes:
> data
Var1 Var2 Freq rel_freq
1 A 1 3 0.2727273
2 B 1 3 0.6000000
3 A 2 4 0.3636364
4 B 2 2 0.4000000
5 A 3 4 0.3636364
6 B 3 0 0.0000000
I wonder how to construct equivalent dplyr
pipeline. Pasted below is my attempt:
library(dplyr)
data %>%
group_by(value, class) %>%
summarise(n = n()) %>%
complete(class, fill = list(n = 0)) %>%
mutate(freq = n / sum(n))
I compute relative frequencies for each value, but, unfortunately, separately for each pair of classes (instead for group totals):
Source: local data frame [6 x 4]
Groups: value [3]
value class n freq
<int> <fctr> <dbl> <dbl>
1 1 A 3 0.5000000
2 1 B 3 0.5000000
3 2 A 4 0.6666667
4 2 B 2 0.3333333
5 3 A 4 1.0000000
6 3 B 0 0.0000000