The suggestion by Mr. Bugle is rather generic and doesn't show any code. Picking this up, the code of the OP needs to be modified as follows:
library(data.table)
dtp2[, sum(kcal) / uniqueN(dtp2$category), by = category]
which returns
category V1
1: chips 55.00
2: vegetables 1.25
3: pizza 50.00
4: beer 37.50
Note that uniqueN(dtp2$category)
is used not just uniqueN(category)
as this is always 1
when grouped by category
.
However, there are situations where missing values are assumed to be zero, without nested for loops putting in the zeroes as the OP has asked.
One situation could arise when data is reshaped from long to wide format for presentation of the data:
reshape2::dcast(dtp2, age ~ category, fun = mean, value.var = "kcal", margins = TRUE)
age beer chips pizza vegetables (all)
1 4 NaN 100 100 5 68.33333
2 5 NaN 120 NaN NaN 120.00000
3 6 NaN NaN 100 NaN 100.00000
4 18 150 NaN NaN NaN 150.00000
5 (all) 150 110 100 5 95.83333
Here, the margin means are computed only from the available data which is not what the OP askd for. (Note that the parameter fill = 0
has no effect on the computation of the margins.)
So, the missing values need to be filled up before reshaping. In base R, expand.grid()
can be used for this purpose, in data.table
it's the cross join function CJ()
:
expanded <- dtp2[CJ(age, category, unique = TRUE), on = .(age = V1, category = V2)
][is.na(kcal), kcal := 0][]
expanded
age category kcal
1: 4 beer 0
2: 4 chips 100
3: 4 pizza 100
4: 4 vegetables 5
5: 5 beer 0
6: 5 chips 120
7: 5 pizza 0
8: 5 vegetables 0
9: 6 beer 0
10: 6 chips 0
11: 6 pizza 100
12: 6 vegetables 0
13: 18 beer 150
14: 18 chips 0
15: 18 pizza 0
16: 18 vegetables 0
Now, reshaping from long to wide returns the expected results:
reshape2::dcast(expanded, age ~ category, fun = mean, value.var = "kcal", margins = TRUE)
age beer chips pizza vegetables (all)
1 4 0.0 100 100 5.00 51.2500
2 5 0.0 120 0 0.00 30.0000
3 6 0.0 0 100 0.00 25.0000
4 18 150.0 0 0 0.00 37.5000
5 (all) 37.5 55 50 1.25 35.9375