This is an offshoot of an earlier post that built a discussion around simplifying my function and eliminating the need for merging data frames that result from an lapply
. Although tools such as dplyr
and data.table
eliminate the need for the merging, I'd still like to know how to merge in this situation. I have simplified the function that produces the list based on this answer to my previous question.
#Reproducible data
Data <- data.frame("custID" = c(1:10, 1:20),
"v1" = rep(c("A", "B"), c(10,20)),
"v2" = c(30:21, 20:19, 1:3, 20:6), stringsAsFactors = TRUE)
#Split-Apply function
res <- lapply(split(Data, Data$v1), function(df) {
cutoff <- quantile(df$v2, c(0.8, 0.9))
top_pct <- ifelse(df$v2 > cutoff[2], 10, ifelse(df$v2 > cutoff[1], 20, NA))
na.omit(data.frame(custID = df$custID, top_pct))
})
This gives me the following results:
$A
custID top_pct
1 1 10
2 2 20
$B
custID top_pct
1 1 10
2 2 20
6 6 10
7 7 20
I would like the results to look like this:
custID A_top_pct B_top_pct
1 1 10 10
2 2 20 20
3 6 NA 10
4 7 NA 20
What's the best way to get there? Should I be doing some sort of reshaping? If I do that, do I have to merge the data frames first?
Here's my solution, which may not be the best. (In the real application, there would be more than two data frames in the list.)
#Change the new variable name
names1 <- names(res)
for(i in 1:length(res)) {
names(res[[i]])[2] <- paste0(names1[i], "_top_pct")
}
#Merge the results
res_m <- res[[1]]
for(i in 2:length(res)) {
res_m <- merge(res_m, res[[i]], by = "custID", all = TRUE)
}