I have a large dataframe, and I want to create another dataframe from it, which allows me to check correlation of a variable ("rate") with the "out" variable for each possible combination of the unique values other columns have. Yes, the data would be subset for the combination too. For example:
> data = data.frame(a=c(1,1,1,2,2,3),
b=c("apples", "oranges", "apples", "apples", "apples", "grapefruit"),
c=c(12, 22, 22, 45, 67, 28),
d=c("Monday", "Monday", "Monday", "Tuesday", "Wednesday", "Tuesday"),
out = c(12, 14, 16, 18, 20, 22),
rate = c(0.01, 0.02, 0.03, 0.04, 0.07, 0.06))
I want to check the correlation of rate with out for each combination of the data frame. i.e. the output should be like
> datacorr
comb correlation
1, apples xxx
1, apples, 12 xxx
1, apples, 12, Monday xxx
1,2,3, apples xxx
Monday, Tuesday, apples xxx
I am trying to create a data frame with all unique values as:
dim.data <- do.call(expand.grid,lapply(data,unique))
and trying to go from here.
A friend did this for one column:
z <- (data %>% select(c) %>% distinct())$c
kp <- function(gg, r)
{
corr1 <- data.frame(x = character(), corr = numeric())
p <- unlist(lapply(1:r, function(y) {combn(gg, y, FUN = paste, collapse = ", ")}))
dat <- lapply(1:length(p), function(y){
k <- as.integer(strsplit(p[y], ",")[[1]])
corr <- (data %>% filter(a %in% k) %>% select(out, rate) %>% cor %>% as.data.frame())$rate[1]
corr1 <- add_row(corr1, x= p[y], corr=corr)
})
final <- do.call(rbind, dat)
return(final)
}
However, this doesn't work on Windows, but works perfectly on Mac. Can someone also help me edit it to run for windows? I have been trying but failing.