Porportions by group in data.table

Question

After some time away from R I feel I am making very clumsy code to get basic summary statistics in data.table.

What I am doing is finding proportion of individuals in good/bad health conditional on species.

# Some data 
n = 300
set.seed(2)
dt <- data.table(type = sample(x = c("Dog", "Cat", "Horse"), size = n, replace = TRUE),
                 health = sample(x = c("Good", "Bad"), size = n, replace = TRUE))

# Making the table. In a clumsy manner?
dt.fr <- dt[, .N, .(type, health)][, perc.type := N/sum(N)*100, 
                                   by = type][order(type, health)]
dt.fr

    type health  N perc.type
1:   Cat    Bad 38  44.70588
2:   Cat   Good 47  55.29412
3:   Dog    Bad 56  50.90909
4:   Dog   Good 54  49.09091
5: Horse    Bad 61  58.09524
6: Horse   Good 44  41.90476

How would I produce the table above with more elegant code?

I guess it's a subjective question. I think your way is fine; kind of has to be done in two steps since you're aggregating on two levels. You could nest the steps instead of chaining them, but I think that's harder to read: `dt[, {NN = .N; .SD[, .(N = .N, perc.type = 100*.N/NN), keyby=health]}, keyby=type]` — Frank, Aug 26 '16 at 00:37
dt[, perc.type := prop.table(health), by = type]. Then use setorder to order the column values by reference. Note, I did not try this as I am away from my desk. — Sathish, Aug 26 '16 at 00:45
Interesting @Sathish but gives error `Error in sum(x) : invalid 'type' (character) of argument` — s_baldur, Aug 26 '16 at 00:49
`within(data.frame(table(dt)), P <- ave(Freq, type, FUN = prop.table) * 100)` — rawr, Aug 26 '16 at 00:53
Hope this answer of mine may help you http://stackoverflow.com/questions/38778447/proportional-tables-by-group/38779415#38779415 — Sathish, Aug 26 '16 at 00:54
I'd just use `prop.table` so it's obvious what you're doing: `dt[, .N, by = .(type, health)][, perc.type := prop.table(N), by = type][]` — alistaire, Aug 26 '16 at 00:59
Thanks @alistaire. But I guess it depends on exposure to R and math which one you find more obvious. — s_baldur, Aug 26 '16 at 01:04
R, yes; math...maybe. Really, I just start to zone out when reading somebody's hard-coded stats (especially when it's longer), so I stick to existing functions when there is one. TBH, I'd use the dplyr `dt %>% count(type, health) %>% mutate(perc.type = prop.table(n))` anyway, but that's a different war. — alistaire, Aug 26 '16 at 01:10

Porportions by group in data.table

0 Answers0