1

I have a vector with 2 elements:

v1 <- c('X1','X2')

I want to create possible combinations of these elements.

The resultant data frame would look like:

    structure(list(ID = c(1, 2, 3, 4), c1 = c("X1", "X2", "X1", "X2"
), c2 = c("X1", "X1", "X2", "X2")), class = "data.frame", row.names = c(NA, 
-4L))

Here, rows with ID=2 and ID=3 have same elements (however arranged in different order). I would like to consider these 2 rows as duplicate. Hence, the final output will have 3 rows only. i.e. 3 combinations

  1. X1, X1
  2. X1, X2
  3. X2, X2

In my actual dataset, I have 16 such elements in the vector V1.

I have tried using expand.grid approach for obtaining possible combinations but this actually exceeds the machine limit. (number of combinations with 16 elements will be too large). This is potentially due to duplications described above.

Can someone help here to get all possible combinations without any duplications ?

I am actually looking for a solution that uses data table functionality. I believe this can be really faster

Thanks in advance.

Prateek
  • 61
  • 6
  • 1
    I think [this post](https://stackoverflow.com/a/47983855/10802499) would be helpful to you. – ekoam Nov 25 '20 at 16:24

5 Answers5

2

Here is a base R solution using your sample == data:

First, create your combinations. Using unique = TRUE cuts back on the number of combinations.

library(data.table)

data <- setDT(CJ(df$c1, df$c2, unique = TRUE))

Then, filter out duplicates:

data[!duplicated(t(apply(data, 1, sort))),]

This gives us:

   V1 V2
1  X1 X1
2  X2 X1
10 X2 X2
Matt
  • 7,255
  • 2
  • 12
  • 34
  • Here, Row 2 and Row 3 are duplicate rows as described in problem statement. – Prateek Nov 25 '20 at 15:35
  • Hi Matt , expand.grid is really taking a lot of time for me to generate all combinations. Moreover, it is unable to generate vector of too large size given so many combinations. can we look for some way which can avoid generating duplicate combinations ? – Prateek Nov 25 '20 at 15:58
  • You could try `CJ` from `data.table`, and set `unique = TRUE` to cut down on combinations: `data <- setDT(CJ(df$c1, df$c2, unique = TRUE))` – Matt Nov 25 '20 at 16:06
  • No Matt, this is not working. It does not remove duplicate combinations. – Prateek Nov 25 '20 at 16:10
  • First, you need to run the `CJ` from `data.table`, and then you need to run the code above to filter out duplicates: `data <- data[!duplicated(t(apply(data, 1, sort))),]` – Matt Nov 25 '20 at 16:11
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/225108/discussion-between-prateek-and-matt). – Prateek Nov 25 '20 at 16:12
1

I would look into the ?expand.grid function for this type of task.

expand.grid(v1, v1)
    Var1 Var2
1   X1   X1
2   X2   X1
3   X1   X2
4   X2   X2
GWD
  • 1,387
  • 10
  • 22
1

dat4 is the final output.

v1 <- c('X1','X2')

library(data.table)

dat <- expand.grid(v1, v1, stringsAsFactors = FALSE)

setDT(dat)

# A function to combine and sort string from two columns
f <- function(x, y){
  z <- paste(sort(c(x, y)), collapse = "-")
  return(z)
}

# Apply the f function to each row
dat2 <- dat[, comb := f(Var1, Var2), by = 1:nrow(dat)]
# Remove the duplicated elements in the comb column
dat3 <- unique(dat2, by = "comb")
# Select the columns
dat4 <- dat3[, c("Var1", "Var2")]

print(dat4)

#    Var1 Var2
# 1:   X1   X1
# 2:   X2   X1
# 3:   X2   X2
www
  • 38,575
  • 12
  • 48
  • 84
1

You may want to check RcppAlgos::comboGeneral, which does exactly what you want and is known to be fast and memory efficient. Just do something like this:

vars <- paste0("X", 1:2)
RcppAlgos::comboGeneral(vars, length(vars), repetition = TRUE)

Output

     [,1] [,2]
[1,] "X1" "X1"
[2,] "X1" "X2"
[3,] "X2" "X2"

On my laptop with 16Gb RAM, I can run this function up to 14 variables, and it takes less than 5s to finish. Speed is less of a concern. However, note that you need at least 17.9Gb RAM to get all 16-variable combinations.

ekoam
  • 8,744
  • 1
  • 9
  • 22
1

We can use crossing from tidyr

library(tidyr)
crossing(v1, v1)
akrun
  • 874,273
  • 37
  • 540
  • 662