0

Can somebody help me with data manipulation using R? i have data (data.train) like this

datex   <- rep(c(rep("01/01/17",6),rep("02/01/17",6),rep("03/01/17",6)),1)
datex <- as.Date(datex, "%d/%m/%y")
Ax  <- rep("A1",18)
Bx <- rep(c(rep("B1",3),rep("B2",3)),3)
Cx <- rep(c("C1","C2","C3"),6)
valx <- 100
for(i in 1:17){valx[i+1] <- valx[i]+1}
data.train <- data.frame(datex, Ax, Bx, Cx, valx)

i need all combination from variable and the final form is like this enter image description here

I have tried this code:

### Library
library(dplyr)
## datex
datex   <- rep(c(rep("01/01/17",6),rep("02/01/17",6),rep("03/01/17",6)),1)
datex <- as.Date(datex, "%d/%m/%y")
Ax  <- rep("A1",18)
Bx <- rep(c(rep("B1",3),rep("B2",3)),3)
Cx <- rep(c("C1","C2","C3"),6)
valx <- 100
for(i in 1:17){valx[i+1] <- valx[i]+1}
data.train <- data.frame(datex, Ax, Bx, Cx, valx)
names.group <- names(data.train)[1:length(data.train)-1]
data.group <- Map(combn, list(names.group), seq_along(names.group), simplify = F) %>% unlist(recursive = F)
find.index <- sapply(data.group, function(x, find.y){
  any(find.y %in% x)
}, find.y = c("datex"))
index.group <- NULL
for(i in 2:length(find.index)){
  if(find.index[i] == "TRUE"){
    index.group[i] <- i
  }
}
index.group[is.na(index.group)] <- 0
for(i in 1:length(data.group)){
  if(index.group[i] == 0){
    data.group[[i]] <- 0
  } else {
    data.group[[i]] <- data.group[[i]]
  }
}
data.group2 <- data.group[sapply(data.group, function(x) any(x != 0))] 
combination.result <- lapply(data.group2, FUN = function(x) {
  do.call(what = group_by_, args = c(list(data.train), x)) %>% summarise(sumVar = sum(valx))
})
combination.result

but i don't produce what i want. Thanks

chinsoon12
  • 25,005
  • 4
  • 25
  • 35
Faryan
  • 409
  • 3
  • 11
  • 2
    Welcome to SO! Please read https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example From images noone can reproduce your data. Please edit your question! – jogo Feb 20 '18 at 07:52

1 Answers1

1

You can generate for combinations of length 1 then for combinations of length 2. Use paste to create your Variable column. Then rbindlist all your results to get the final output.

library(data.table)
setDT(data.train)
sumCombi <- function(x, mySep="_") {
    data.train[ , sum(Val), by=c("Date", x)][,
        list(Date, 
            Variable=do.call(paste, c(.SD[,x,with=FALSE], list(sep=mySep))), 
            SumVal=V1)]
}

rbindlist(c(
    #combinations with 1 element in each combi
    lapply(c("A", "B", "C"), sumCombi)
    ,
    #combinations with 2 elements in each combi
    lapply(combn(c("A","B","C"), 2, simplify=FALSE), sumCombi)
), use.names=FALSE)

or more generically/programmatically:

#assuming that your columns are in the middle of the columns while excl. first and last columns
myCols <- names(data.train)[-c(1, ncol(data.train))]

rbindlist(unlist(
    lapply(seq_along(myCols), function(n)
        combn(myCols, n, sumCombi, simplify=FALSE)
    ), recursive=FALSE), 
use.names=FALSE)
chinsoon12
  • 25,005
  • 4
  • 25
  • 35
  • Wow.. it's amazing, but your solution is creating all combination variable into 1 column, how do i create separated table? and how do make this solution works with more 3 variable? Thanks before – Faryan Feb 20 '18 at 08:28
  • you can just remove the `rbindlist` – chinsoon12 Feb 20 '18 at 08:29
  • thanks for sharing.. I have some question: 1. if i wanna do with combine A1B1C1, A1B1C2, A1B1C3, ... , A1B2C3. what should i do? 2. What should i do if variable is more than 3, example i have A, B, C, D & E? can we make it more generically Thanks chinsoon12 – Faryan Feb 20 '18 at 08:40
  • 1. just change `c(1,2)` to `c(1,2,3)` – chinsoon12 Feb 20 '18 at 08:51
  • 2. assuming that your columns are in the middle of the columns excl first and last columns, you can use `myCols <- names(data.train)[-c(1, ncol(data.train))]` to get a list of columns, the change `c(1,2)` to `seq_along(myCols)` – chinsoon12 Feb 20 '18 at 08:53
  • i am only integrating someone's work for your problem – chinsoon12 Feb 20 '18 at 09:02
  • hello.. i wanna ask again, how do i can add "." or "_" into column name?. So the result will be "A1.B1" or "A1_B1". Thank You before – Faryan Feb 22 '18 at 04:04
  • hi @CarolusMardisonPurba, i updated the function for a customized separator and also simplify the `combn` function. HTH – chinsoon12 Feb 22 '18 at 05:55
  • hi chinsoon12.. thanks for the help, i really appreciate it sir – Faryan Feb 22 '18 at 06:56
  • hi chinsoon12.. thanks for your help before, sorry for bothering you, but i need your help again. How can i get Ax.A1_Bx.B1, Ax.A1_Bx.B1_Cx.C1 etc in combination? Thank You – Faryan Mar 08 '18 at 10:55
  • hi @CarolusMardisonPurba, i am not quite sure what you mean. do you want to post a new qn? – chinsoon12 Mar 09 '18 at 03:37