2

I would like to generate different possible permutations with the same frequency as in the input vector. For example, I would like to generate the permutations using the vector x in the below example.

library(gtools)
x <- c('A','A','B')
permutations(2, 3, x, repeats.allowed = T)

It gives the below output.

#     [,1] [,2] [,3]
# [1,] "A"  "A"  "A" 
# [2,] "A"  "A"  "B" 
# [3,] "A"  "B"  "A" 
# [4,] "A"  "B"  "B" 
# [5,] "B"  "A"  "A" 
# [6,] "B"  "A"  "B" 
# [7,] "B"  "B"  "A" 
# [8,] "B"  "B"  "B" 

But, I want only permutations having A, B with frequencies 2, 1 respectively. The expected output is:

#     [,1] [,2] [,3]
# [1,] "A"  "A"  "B" 
# [2,] "A"  "B"  "A" 
# [3,] "B"  "A"  "A" 

Is there any function available in R?

Note: I do not want to do post-processing of the output to get the expected output as my original input contains 300 elements. It is not recommended to generate factorial(300) number of permutations.

Update: The suggested link provides a nice faster solution but fails when the input vector is doubled (eg: length=20) with the error message:

Error in matrix(NA, nrow = N, ncol = prod(sapply(foo, ncol))) :
invalid 'ncol' value (too large or NA)

Prradep
  • 5,506
  • 5
  • 43
  • 84

1 Answers1

2

Your problem can be reformulated as finding all possible permutations of the frequency vector. Take a look at combinat::permn:

x <- c( 'A', 'A', 'B' )
unique(combinat::permn( x ))

# [[1]]
# [1] "A" "A" "B"

# [[2]]
# [1] "A" "B" "A"

# [[3]]
# [1] "B" "A" "A"

unique is necessary to remove duplicate entries, which is automatically done by gtools::permutations you've been using (through the default set=TRUE argument).

If you need the result in matrix format, as in your original question, pass the output as arguments to rbind using do.call:

do.call( rbind, unique(combinat::permn( x )) )
#      [,1] [,2] [,3]
# [1,] "A"  "A"  "B" 
# [2,] "A"  "B"  "A" 
# [3,] "B"  "A"  "A" 
Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74
  • Thanks for another approach, @Artem. But still, the issue with the `Note` remains. When I try for `combinat::permn(sample(LETTERS[10], 30, replace = T))`, I encounter an error `Error in vector("list", gamma(n + 1)) : vector size specified is too large`. For lesser numbers(eg: 20), it will consume entire memory and crashes before producing memory & identify unique values. Do you still see any opportunity to generate permutations when the vector size is more than 200? – Prradep Oct 27 '17 at 20:19
  • 353 non-unique elements, 60 unique elements with frequency ranging from 1 to 69. – Prradep Oct 27 '17 at 20:30
  • Consider the corner case where all 60 of your elements occur exactly once in `x`. There would be 60! permutations, which is the lower bound on your actual problem (60 unique elements with multiple occurrences). This means that no matter the algorithm, you will never have fewer than 60! rows in your results matrix. I'm afraid that you might need to think of alternative solutions to your high-level task. – Artem Sokolov Oct 27 '17 at 20:36
  • 1
    With that said, take a look at this question for more efficient ways to compute unique permutations: https://stackoverflow.com/questions/5671149/permute-all-unique-enumerations-of-a-vector-in-r – Artem Sokolov Oct 27 '17 at 20:36
  • Thanks Artem, The accepted solution fails when the vector size is around 20 or more. – Prradep Oct 29 '17 at 13:52