I have encountered a problem, that i am sure has a easy solution, but i cannot find it. I basically summarise my table to get the sum of an value per level of a factor variable:
NOdependants <- unique(claimsMonthly[policyID == policy, .(exposure = sum(exposure)),
by = c("productID", "Year", "product", "QualityCheck", "dependant")][order(Year)])
productID Year product QualityCheck dependant exposure
1: 1 2016 ELI18 0 EMPLOYEE 17.041096
2: 1 2016 ELI18 0 SPOUSE 40.484932
3: 1 2016 ELI18 0 CHILD 5.164384
and then i do the following:
NOdependants <- dcast(NOdependants, productID + Year ~ dependant, value.var = "exposure", fill = 0, drop = FALSE, fun.aggregate = sum)
setnames(NOdependants, c("CHILD", "EMPLOYEE", "SPOUSE"), c("childno", "employeeno", "spouseno"), skip_absent=TRUE)
> NOdependants
productRank startYear childno employeeno spouseno
1: 1 2016 5.164384 17.041096 41.484932
This is all good so far. The problem is when a product doesnt have any data on one of the factor in dependant. Lets say there are no children:
NOdependants <- unique(claimsMonthly[policyID == policy, .(exposure = sum(exposure)),
by = c("productID", "Year", "product", "QualityCheck", "dependant")][order(Year)])
productID Year product QualityCheck dependant exposure
1: 1 2016 ELI18 0 EMPLOYEE 17.041096
2: 1 2016 ELI18 0 SPOUSE 40.484932
Then my dcast does the following:
> NOdependants
productRank startYear employeeno spouseno
1: 1 2016 17.041096 41.484932
and this is a problem for me, I need to have all three columns. SO what i need is to artifically create an additional columns in case a factor level is without data (like child in here), so i would get something this:
> NOdependants
productRank startYear childno employeeno spouseno
1: 1 2016 0 17.041096 41.484932
for now i have create a workround, where i first create an empty data.table and then use rbindlist
with fill=0
to merge theses, but there must be some easier solution.
Any ideas?
Note: i am working with a lot of data and this operation is part of a loop which will be repeated around 80 times or so, so ideally something efficient is possible.
SIMPLIFIED EXAMPLE WITH DATA:
#
> claimsMonthly <- data.table(productID = c(rep(1,6), rep(2,3), rep(3,2)),
+ Year = c(rep(2015,9), 2016, 2016),
+ product = c(rep("ELI18",6), rep("JCI22",3), rep("ZDP01",2)),
+ dependant = c(rep(c("EMPLOYEE", "SPOUSE", "CHILD"), 3),"EMPLOYEE", "SPOUSE"),
+ QualityCheck = c(rep(0,11)),
+ exposure = c(abs(rnorm(11))))
>
> productIDs <- unique(claimsMonthly$productID)
> for(prod in productIDs){
+
+ NOdependants <- unique(claimsMonthly[ productID == prod, .(exposure = sum(exposure)),
+ by = c("productID", "Year", "product", "QualityCheck", "dependant")][order(Year)])
+
+ NOdependants <- dcast(NOdependants, productID + Year ~ dependant, value.var = "exposure", fill = 0, drop = FALSE, fun.aggregate = sum)
+ setnames(NOdependants, c("CHILD", "EMPLOYEE", "SPOUSE"), c("childno", "employeeno", "spouseno"), skip_absent=TRUE)
+
+ NOdependants[order(childno)]
+
+ }
Error in .checkTypos(e, names_x) :
Object 'childno' not found amongst productID, Year, employeeno, spouseno