3

I have a vector with five items.

my_vec <- c("a","b","a","c","d")

If I want to re-arrange those values into a new vector (shuffle), I could use sample():

shuffled_vec <- sample(my_vec)

Easy - but the sample() function only gives me one possible shuffle. What if I want to know all possible shuffling combinations? The various "combn" functions don't seem to help, and expand.grid() gives me every possible combination with replacement, when I need it without replacement. What's the most efficient way to do this?

Note that in my vector, I have the value "a" twice - therefore, in the set of shuffled vectors returned, they all should each have "a" twice in the set.

ozjimbob
  • 190
  • 1
  • 6

3 Answers3

5

I think permn from the combinat package does what you want

library(combinat)
permn(my_vec)

A smaller example

> x
[1] "a" "a" "b"
> permn(x)
[[1]]
[1] "a" "a" "b"

[[2]]
[1] "a" "b" "a"

[[3]]
[1] "b" "a" "a"

[[4]]
[1] "b" "a" "a"

[[5]]
[1] "a" "b" "a"

[[6]]
[1] "a" "a" "b"

If the duplicates are a problem you could do something similar to this to get rid of duplicates

strsplit(unique(sapply(permn(my_vec), paste, collapse = ",")), ",")

Or probably a better approach to removing duplicates...

dat <- do.call(rbind, permn(my_vec))
dat[duplicated(dat),]
Dason
  • 60,663
  • 9
  • 131
  • 148
4

Noting that your data is effectively 5 levels from 1-5, encoded as "a", "b", "a", "c", and "d", I went looking for ways to get the permutations of the numbers 1-5 and then remap those to the levels you use.

Let's start with the input data:

my_vec <- c("a","b","a","c","d") # the character
my_vec_ind <- seq(1,length(my_vec),1) # their identifier

To get the permutations, I applied the function given at Generating all distinct permutations of a list in R:

permutations <- function(n){
  if(n==1){
    return(matrix(1))
  } else {
    sp <- permutations(n-1)
    p <- nrow(sp)
    A <- matrix(nrow=n*p,ncol=n)
    for(i in 1:n){
      A[(i-1)*p+1:p,] <- cbind(i,sp+(sp>=i))
    }
    return(A)
  }
}

First, create a data.frame with the permutations:

tmp <- data.frame(permutations(length(my_vec)))

You now have a data frame tmp of 120 rows, where each row is a unique permutation of the numbers, 1-5:

>tmp
    X1 X2 X3 X4 X5
1    1  2  3  4  5
2    1  2  3  5  4
3    1  2  4  3  5
...
119  5  4  3  1  2
120  5  4  3  2  1

Now you need to remap them to the strings you had. You can remap them using a variation on the theme of gsub(), proposed here: R: replace characters using gsub, how to create a function?

gsub2 <- function(pattern, replacement, x, ...) {
  for(i in 1:length(pattern))
    x <- gsub(pattern[i], replacement[i], x, ...)
  x
}

gsub() won't work because you have more than one value in the replacement array.

You also need a function you can call using lapply() to use the gsub2() function on every element of your tmp data.frame.

remap <- function(x, 
              old,
              new){
  return(gsub2(pattern = old, 
              replacement = new, 
              fixed = TRUE,
              x = as.character(x)))
}

Almost there. We do the mapping like this:

shuffled_vec <- as.data.frame(lapply(tmp, 
                          remap,
                          old = as.character(my_vec_ind), 
                          new = my_vec))

which can be simplified to...

shuffled_vec <- as.data.frame(lapply(data.frame(permutations(length(my_vec))), 
                          remap,
                          old = as.character(my_vec_ind), 
                          new = my_vec))

.. should you feel the need.

That gives you your required answer:

> shuffled_vec
    X1 X2 X3 X4 X5
1    a  b  a  c  d
2    a  b  a  d  c
3    a  b  c  a  d
...
119  d  c  a  a  b
120  d  c  a  b  a
Community
  • 1
  • 1
Andy Clifton
  • 4,926
  • 3
  • 35
  • 47
  • Even though the OP didn't reply, I have a very similar problem to this and found this to be very useful. However, I have one additional follow-up question... This question had only 5 elements, but for situations where there are many more elements, speed is obviously an issue. Also, for most applications of this kind of problem we only need say 10,000 returned permutations. Is it possible to amend this code to only return up to 10,000 unique perms? – jalapic Jul 06 '14 at 19:36
  • If you want 10,000 randomly-sampled permutations, use something like `tmp <- tmp[ sample(1:NROW(tmp), 10000, replace=F),]` – Andy Clifton Jul 07 '14 at 19:16
3

Looking at a previous question (R: generate all permutations of vector without duplicated elements), I can see that the gtools package has a function for this. I couldn't however get this to work directly on your vector as such:

permutations(n = 5, r = 5, v = my_vec)
#Error in permutations(n = 5, r = 5, v = my_vec) : 
#  too few different elements

You can adapt it however like so:

apply(permutations(n = 5, r = 5), 1, function(x) my_vec[x])

#     [,1] [,2] [,3] [,4] 
#[1,] "a"  "a"  "a"  "a" ...
#[2,] "b"  "b"  "b"  "b" ...
#[3,] "a"  "a"  "c"  "c" ... 
#[4,] "c"  "d"  "a"  "d" ...
#[5,] "d"  "c"  "d"  "a" ... 
Community
  • 1
  • 1
thelatemail
  • 91,185
  • 12
  • 128
  • 188