1

I have a list with vectors of options (my actual list is much longer).

Data <- structure(list(A = c("01", "02", "03", "04", "05", "06", 
"07", "08", "09", "99"), B = c("0", "1", "2", "3", "4", 
"5", "6", "7", "99"), C = c("00", "10", "11", "12", "13", 
"14", "15", "16", "17", "99")), .Names = c("A", "B", 
"C"))

I need to iterate over all the combinations of these options.

I was using expand.grid to create a dataframe with the combinations, and then iterating over that dataframe one row at a time.

combinations <- expand.grid(Data, stringsAsFactors = FALSE)

For this small sample 'combinations' results in 900 rows, but it grows exponentially the bigger Data is, and the result for my data is way too large.

I was thinking I could convert my list to a matrix like

Data.df <- data.frame(do.call(rbind, lapply(Data, "length<-", max(vapply(Data, length, 1L)))))

and then use an 'index' column that would correspond to the current iteration for each row. They would all start initialized to 1, and I could use them to get the current set.

Data.df$index <- 1
Data.df$current <- Data.df[cbind(1:nrow(Data.df), as.numeric(Data.df$index))]

The problem I am having is how to actually iterate over all the options (short of using nested loops for the number of rows I have). Since some of the rows of the dataframe are "NA"s, when it gets to one it would mean that row is "finished" and would have to move the next index.

If there is a different, better way of doing this, I open to suggestions as well...

Elks
  • 85
  • 1
  • 10
  • 1
    Is `fmly.unique` the same as `Data` in your example? – talat Feb 04 '15 at 17:54
  • @docendodiscimus, yes. sorry, I edited the question. – Elks Feb 04 '15 at 17:57
  • What are your actual vector lengths? Are you sure you really need to do every option? – Gregor Thomas Feb 04 '15 at 18:16
  • One possible solution would be to compromise between looping and `expand.grid`. You could limit things to 1 level of nesting (2 loops) by picking one vector for your outer loop, `expand.grid` on the rest of the vectors and go row-by-row for the inner loop. You could write output to disk at every step of the outer loop. – Gregor Thomas Feb 04 '15 at 18:30
  • @Gregor, I currently have 12 vectors, that have a length of 5-10, but I am probably going to need more than the 12 vectors. Why do you think I might not need every option? – Elks Feb 04 '15 at 18:32
  • But, depending what you're doing, you might be just as well off sampling. Just pick 1 or 10 million random samples. I don't know what you're calculating, but my guess is your results will be just about the same even with 5e5 samples as if you did every single combination. See my answer to this [related question/posssible duplicate](http://stackoverflow.com/a/13851210/903061). – Gregor Thomas Feb 04 '15 at 18:32
  • @Gregor, I see what you meant by not needing every option, but I really do need every option. – Elks Feb 04 '15 at 18:35
  • 4
    Well, what are you doing with them? Surely not iteration for iteration's sake. With 12 vectors, using an "average" length of 8, you have around 8^12 = **68.7 Billion unique combinations**. Why do you think you need to examine every single one of them? *What are you doing with each option?* With that many combinations, if you spend 1 millisecond on each one you're looking at just over 2 years computing time. – Gregor Thomas Feb 04 '15 at 18:35
  • @Gregor the project as designed did actually require iterating over all the options, but thanks to your comment we revised the specs... (I knew the data was going to be large, but didn't think of actually calculating HOW large... )Thanks! – Elks Feb 05 '15 at 07:57

0 Answers0