1

I'm novice in r. I need to populate a data set with permutation. I had created one for small data set, where there are 4 (slots) columns, where it can be filled by any number between 0 to 8. Their sum should be equal to 6.

I need to do for larger set, where columns(slots =6) and sequence is 1 to 200 and sum required is 100. According to above script it taking too much time. Kindly suggest some another way of doing it.

Thanks in advance.

library(gtools)
library(dplyr)

df <- as.data.frame( permutations(5,4,seq(0,8,1))) %>% 
  mutate(sum = `V1`+`V2`+`V3`+`V4`) %>% 
  filter(sum == 6) %>% 
  select(-sum)
Xenus
  • 43
  • 4

2 Answers2

2

Using RcppAlgos (I am the author), this is trivial.

RcppAlgos::permuteGeneral(seq(0, 8, 1), 4,
                          constraintFun = "sum",
                          comparisonFun = "==",
                          limitConstraints = 6)

The algorithm underneath is optimized to quickly prune away solutions that are impossible. We also only considered checking combinations as addition/multiplication are commutative and order doesn't matter. Once we find a suitable combination, we generate all permutations of that specific combination. It also helps that we are using Rcpp for major efficiency gains.

For the OP's real world problem with 200 numbers and 6 columns, feasibility will depend heavily on the sum needed. If we considered the average sum (which will occur the most), we may need to consider alternative approaches as the shear number of possible solutions exceed 2^31 - 1. It will also take a considerable amount of time. Just with 5 columns and a desired sum of 500, I cannot even produce the permutations. I can, however produce combinations:

res <- RcppAlgos::comboGeneral(1:200, 5,
                               constraintFun = "sum",
                               comparisonFun = "==",
                               limitConstraints = 500, 
                               upper = 1e8)  ## upper argument constrains the output to a maximum number of results
nrow(res)
[1] 7669861

And given that there are no repeats, we can calculate the number of permutations to be:

7669861 * factorial(5) = 920,383,320

Here is the error I get:

res <- RcppAlgos::permuteGeneral(1:200, 5,
                                constraintFun = "sum",
                                comparisonFun = "==",
                                limitConstraints = 500, 
                                upper = 921000000)
Show Traceback

Rerun with Debug
Error: vector memory exhausted (limit reached?) 

If the desired sum is relatively small or large compared to the average sum, computation is possible. For example, if the desired sum is 100, we can quickly get all of the permutations:

system.time(res <- RcppAlgos::permuteGeneral(1:200, 6,
                                             constraintFun = "sum",
                                             comparisonFun = "==",
                                             limitConstraints = 100, 
                                             upper = 1e8))
 user  system elapsed 
2.213   0.525   2.753 

nrow(res)
[1] 47395440
Joseph Wood
  • 7,077
  • 2
  • 30
  • 65
  • 1
    Thanks @Henrik... Currently your workaround is the best solution. When I initially wrote this code, I wasn't aware of the need for weak compositions (where 0 is treated like other digits). In the next release, you will have access to a suite of composition functions where `weak` will be a parameter. The interface in general is much nicer than the more general `permueGeneral` interface. Thanks for all of the feedback! – Joseph Wood Aug 05 '22 at 18:32
  • Also @Henrik, I saw your comments on the answer you linked too... I totally agree (that was my thumbs up on the comment). – Joseph Wood Aug 05 '22 at 18:33
  • [Voilà!](https://stackoverflow.com/a/73254431/1851712). – Henrik Aug 05 '22 at 19:24
1

One option is:

as.data.frame(permutations(5, 4, seq(0, 8, 1))) %>% 
  filter(reduce(., `+`) == 6)

On the other hand this could be also done outside of dplyr or other packages like:

df <- as.data.frame(permutations(5, 4, seq(0, 8, 1)))

df[reduce(df, `+`) == 6,]

You could also try data.table, e.g.:

library(data.table)

dt <- setDT(as.data.frame(permutations(5, 4, seq(0, 8, 1))))

dt[Reduce(`+`, mget(names(dt))) == 6]

Or another option - likely slower - could also be (after saving into dt as above):

dt[dt[, rowSums(.SD) == 6]]
arg0naut91
  • 14,574
  • 2
  • 17
  • 38