0

I have a large dataframe, and I want to create another dataframe from it, which allows me to check correlation of a variable ("rate") with the "out" variable for each possible combination of the unique values other columns have. Yes, the data would be subset for the combination too. For example:

> data = data.frame(a=c(1,1,1,2,2,3),
              b=c("apples", "oranges", "apples", "apples", "apples", "grapefruit"),
              c=c(12, 22, 22, 45, 67, 28), 
              d=c("Monday", "Monday", "Monday", "Tuesday", "Wednesday", "Tuesday"),
              out = c(12, 14, 16, 18, 20, 22),
              rate = c(0.01, 0.02, 0.03, 0.04, 0.07, 0.06))

I want to check the correlation of rate with out for each combination of the data frame. i.e. the output should be like

> datacorr
  comb                    correlation
  1, apples               xxx
  1, apples, 12           xxx
  1, apples, 12, Monday   xxx
  1,2,3, apples           xxx
  Monday, Tuesday, apples xxx

I am trying to create a data frame with all unique values as:

dim.data <- do.call(expand.grid,lapply(data,unique))

and trying to go from here.

A friend did this for one column:

z <- (data %>% select(c) %>% distinct())$c

kp <- function(gg, r) 
  {
  corr1 <- data.frame(x = character(), corr = numeric())
  p <- unlist(lapply(1:r, function(y) {combn(gg, y, FUN = paste, collapse = ", ")}))

  dat <- lapply(1:length(p), function(y){
    k <- as.integer(strsplit(p[y], ",")[[1]])
    corr <- (data %>% filter(a %in% k) %>% select(out, rate) %>% cor %>% as.data.frame())$rate[1]
    corr1 <- add_row(corr1, x= p[y], corr=corr)
  })
  final <- do.call(rbind, dat)
  return(final)
}

However, this doesn't work on Windows, but works perfectly on Mac. Can someone also help me edit it to run for windows? I have been trying but failing.

Bruce Wayne
  • 471
  • 5
  • 18
  • I think you can find a solution here: https://stackoverflow.com/questions/55163984/generate-all-unique-combinations-from-a-vector-with-repeating-elements/55164457#55164457 – LocoGris Apr 03 '19 at 17:39
  • Thank you, I tried it. But doesn't seem to be the efficient way for me. It is messing things up quite a bit for me. I will keep trying on this direction though. Also, it gives me vector positions somehow and not the element name. – Bruce Wayne Apr 03 '19 at 17:48

0 Answers0