0

Last week I posted the following question . The idea was to make a loop that determined the content of a database by randomly combining observations based on the variable "id".

For instance:

  • dataset 1: combinations of id 1, 2, 3, 4, 5, 6, 7, 8...
  • dataset 2: combinations of id 1, 2, 3
  • dataset 3: combinations of id 2, 3, 4, 5
  • dataset 4: combinations of id 5, 6, 7, 8, 9, 10...

I got a perfect answer to the question:

for(i in 2:max(o$id)){
  combis=combn(unique(o$id),i)
  for(j in 1:ncol(combis)){
    sub=o[o$id %in% combis[,j],]
    out=sub[1,]    # use your function
    out$label=paste(combis[,j],collapse ='') #provide an id so you know for which combination this result is
    result=rbind(result,out) # paste it to previous output
  }
}

However, my question now is the following: is there a way to specify that I only want combinations of at least 5 ids combined? The process takes up a lot of computing time and I noticed that small datasets (with les than 5 different ids) give biased results.

Through this link, a sample of the dataset and the full code can be found to reproduce the example. Please be aware that it can take a while to run the entire code, unless there is something specified that I am only interested in combinations of at least 5 ids.

Community
  • 1
  • 1
user33125
  • 197
  • 1
  • 3
  • 12
  • 1
    Can you provide a minimal reproducible example? Please see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – C8H10N4O2 Nov 16 '16 at 15:44
  • I recently did this for one of my analyses by creating a "count" column and filtering out any combination/group that was based on fewer than 5 ids. – heyydrien Nov 16 '16 at 15:47
  • I'm pretty sure you can optimise that with a mapply statement (which could be parallized as well to speed it up) and (if needed) limit the combination simply by filtering with something like combis[length(combis)>4]. However, I can't be sure as there is no reproducible example... (and the one in your previous post doesn't work (What are AllData and rainfed?) – Bastien Nov 16 '16 at 16:35
  • I'll make a better example. Give me a minute;-) – user33125 Nov 16 '16 at 16:51
  • I'll add the question with this dataset and example as well. Through the following link a part of the dataset can be found with the code necessary. https://drive.google.com/open?id=0By9u5m3kxn9ybi11OEF5NkhkNDQ There are now 6 different ids in the dataset. Thank you very much for your help! I appreciate it a lot! P.S. Be aware that if you run the code as it is now, it takes some time to run. That is the reason why I would like to indicate that I only want the code to run for groups of minimum x ids. – user33125 Nov 16 '16 at 17:18

1 Answers1

1

You can start the loop at 5:

for(i in 5:max(o$id)){
  combis=combn(unique(o$id),i)
   ...

This way, there are at least 5 elements in each combination (see ?combn).

Wave
  • 1,216
  • 1
  • 9
  • 22