Determining All Possible Combinations of Items With a Grouping Variable, Allowing for Different Numbers of Items From Each Original Groups

Question

This question is very similar to this one and this one but it combines elements of both in a way that I can't figure out on my own.

I have the following list.

original_groups <- list(group_1 = as.character(1:6), group_2 = as.character(7:12), group_3 = as.character(13:20))

I want to create new groups based on these original groups. There is a constraint - each new group must contain an equal number of items from each original group. Furthermore, items cannot be used more than once. For example, if we take one item from each original group, we may get the following new groups.

Number_of_Items_From_Each_Original_Group_to_End_up_in_Each_New_Group <- 1
Number_of_New_Groups <- 3

# option 1

new_groups <- list(group_1 = as.character(c(1, 7, 13)), group_2 = as.character(c(2, 8, 14)), group_3 = as.character(c(3, 9, 15)))

# option 2

new_groups <- list(group_1 = as.character(c(1, 7, 13)), group_2 = as.character(c(2, 8, 14)), group_3 = as.character(c(3, 9, 16)))

# option 3

new_groups <- list(group_1 = as.character(c(1, 8, 13)), group_2 = as.character(c(2, 7, 14)), group_3 = as.character(c(3, 9, 15)))

There are three things that make what I'm hoping to do really tricky. First, I want to generate all possible combinations since this operation is part of a larger function. Second, I want to have the option to have multiple items from each original group end up in each new group. Third, I also want to have the option to choose how many new groups there will be. Here is another example.

Number_of_Items_From_Each_Original_Group_to_End_up_in_Each_New_Group <- 2
Number_of_New_Groups <- 3

# option 1

new_groups <- list(group_1 = as.character(c(1, 2, 7, 8, 13, 14)), group_2 = as.character(c(3, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 18)))

# option 2

new_groups <- list(group_1 = as.character(c(1, 2, 7, 8, 13, 14)), group_2 = as.character(c(3, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 19)))

# option 3

new_groups <- list(group_1 = as.character(c(1, 3, 7, 8, 13, 14)), group_2 = as.character(c(2, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 18)))

If each original group contained at least 9 items, I could even make new groups that each contain 3 items from each original group. Alternatively, if each original group contained at least 9 items, I could also increase the number of new groups to 4 if only 2 items from each original group end up in each new group.

Note that the original groups don't all need to contain the same amount of items for this process to work - the third original group contains more items than the other two original groups.

Also, note that item order doesn't matter within new groups. In other words, new_groups <- list(group_1 = as.character(c(1, 2, 7, 8, 13, 14)), group_2 = as.character(c(3, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 18))) is the same as new_groups <- list(group_1 = as.character(c(2, 1, 8, 7, 14, 13)), group_2 = as.character(c(3, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 18))), so in my final output, I would only want one of these options reported.

Finally, note that the number of original groups won’t always equal the number of new groups - they just happen to in this example.

All solutions are welcome but I'd be especially curious to see one using only base functions.

Thank you!

With these three groups of length 6, 6, and 8, and allowing two items from each group, I reckon there are over 20 million possible combinations (90 * 90 * 2520 ways of allocating the elements to three groups of two), so this is quickly going to become problematic in terms of time and/or memory. I wonder why you are trying to do this, and whether there might be a more efficient approach to the problem you are trying to solve. — Andrew Gustar, Jul 08 '22 at 17:03
Will it always be the case that there are three original groups, and three groups you'd like to allocate into? — Jon Spring, Jul 10 '22 at 23:09
No - I just chose three for my reproducible example. Also, the number of original groups might not always equal the number of new groups. I’ll update my question to state that. — David Moore, Jul 11 '22 at 00:03

Edoch · Accepted Answer · 2022-07-18T12:55:03.343

I suggest this solution: it does not use any other library than base.

Define a permutations function, to compute all possibile combinations of elements of a vector (or list) vec taken sublen elements at a time (likewise the combn function)

permutations <- function(vec, sublen, prev_vec=NULL){
  out_list <- list()
  if(sublen==1){
    for(v in vec){
      out_list <- append(out_list,list(append(prev_vec,list(v))))
    }
  } else {
    for (i in 1:(length(vec)-sublen+1)){
      v <- vec[1]
      prev_vec0 <- c(prev_vec,vec[1])
      vec <- vec[2:length(vec)]
      perm_list <- permutations(
        vec=vec,
        sublen=sublen-1,
        prev_vec=prev_vec0
      )
      out_list <- append(out_list,perm_list)
    }
  }
  return(out_list)
}

Define a find_matrix function that unlists matrices from deeply nested lists (source)

find_matrix <- function(x) {
  if (is.matrix(x))
    return(list(x))
  if (!is.list(x))
    return(NULL)
  unlist(lapply(x, find_matrix), FALSE)
}

Define a compatible_rows function, that extract from a dataframe a subset of rows which can be used to create the other output vectors, given an output vector

compatible_rows <- function(df,row_value){
  row_ids <- c()
  if(is.null(nrow(df))){
    return(NULL)
  } else {
    for (row_id in 1:nrow(df)){
      row_ids <- c(row_ids,!any(row_value %in% df[row_id,]))
    }
    return(df[which(row_ids),])
  }
}

Create a new_groups_list function, that computes all possible output matrices

new_groups_list <- function(df, prev_df=NULL, lvl=-1, verbose=F){
  
  lvl <- lvl+1

  results_list <- list()
  
  if(is.null(nrow(df))){
    if(verbose==T) cat(paste0("--BRANCH END (BEGIN lvl ",lvl,")--\n"))
    prev_df0 <- rbind(prev_df,df)
    rownames(prev_df0) <- NULL
    if(verbose==T)  cat("returned\n")
    if(verbose==T)  print(prev_df0)
    if(verbose==T)  cat("\n")
    return(prev_df0)
  }
  
  if(nrow(df)==0){
    if(verbose==T) cat(paste0("--BRANCH END (BEGIN lvl ",lvl,")--\n"))
    prev_df0 <- rbind(prev_df,df)
    rownames(prev_df0) <- NULL
    if(verbose==T)  cat("returned\n")
    if(verbose==T)  print(prev_df0)
    if(verbose==T)  cat("\n")
    return(prev_df0)
  }
  
  for(row_id in 1:nrow(df)){
    
    if(verbose==T)  cat(paste("-- lvl",lvl,"cycle",row_id,"--\n"))
    
    if(verbose==T)  cat("initial results list\n")
    
    if(verbose==T)  print(results_list)
    if(verbose==T)  cat("\n")
    
    if(verbose==T)  cat("df in\n")
    if(verbose==T)  assign("last_df",df,envir = .GlobalEnv)
    if(verbose==T)  print(df)
    if(verbose==T)  cat("\n")
    
    if(is.null(nrow(df))){
      prev_df0 <- rbind(prev_df,df)
      rownames(prev_df0) <- NULL
      if(verbose==T)  cat(paste0("--BRANCH END (MID lvl ",lvl,")--\n"))
      if(verbose==T)  cat("returned\n")
      results_list <- append(results_list,list(prev_df0))
      if(verbose==T)  print(results_list)
      if(verbose==T)  cat("\n")
      return(results_list)
    }
    
    considered_row <- df[1,]
    if(verbose==T)  assign("last_considered_row",considered_row,envir = .GlobalEnv)
    if(verbose==T)  cat("considered rows\n")
    if(verbose==T)  print(considered_row)
    if(verbose==T)  cat("\n")
    
    df <- df[2:nrow(df),]
    if(verbose==T)  assign("last_df",df,envir = .GlobalEnv)
    if(verbose==T)  cat("df without considered rows\n")
    if(verbose==T)  print(df)
    if(verbose==T)  cat("\n")
    
    prev_df0 <- rbind(prev_df,considered_row)
    rownames(prev_df0) <- NULL
    if(verbose==T)  assign("last_prev0",prev_df0,envir = .GlobalEnv)
    if(verbose==T)  cat("collected considered rows\n")
    if(verbose==T)  print(prev_df0)
    if(verbose==T)  cat("\n")
    
    comp_df <- compatible_rows(df,considered_row)
    if(verbose==T)  assign("last_comp_df",comp_df,envir = .GlobalEnv)
    if(verbose==T)  cat("compatible rows in df\n")
    if(verbose==T)  print(comp_df)
    if(verbose==T)  cat("\n")

    if(verbose==T)  cat(paste(">>> GOING TO LVL",lvl+1,"\n\n"))
    new_rows <- new_groups_list(
      comp_df,
      prev_df=prev_df0,
      lvl=lvl,
      verbose=verbose
    )
    if(verbose==T)  cat(paste0("--ROOT (lvl ",lvl,")--\n"))
    if(verbose==T)  cat("result received from branch\n")
    if(verbose==T)  print(new_rows)
    if(verbose==T)  cat("\n")
    
    results_list <- append(results_list,list(new_rows))
    if(verbose==T)  cat("results list\n")
    if(verbose==T)  print(results_list)
    if(verbose==T)  cat("\n")
        
  }
  
  return(results_list)
}

Create create_new_groups, which wraps-up all the other functions, and outputs the whole list of possible solutions

create_new_groups <- function(original_groups, max_output = NULL){

  min_len_original_groups = min(lengths(original_groups))

  num_original_groups = length(original_groups)
  
  max_len_subgroup <- floor(min_len_original_groups/2)
  
  
  if(min_len_original_groups<2){
    return("Not possible to populate new groups: at least one original group has less than 2 elements")
  }
  
  
  NewGroups_subLen_len_num <- list()
  
  for (len_subgroup in 1:max_len_subgroup){
    
    new_group_params <- c(
      len_subgroup,
      len_subgroup*num_original_groups,
      floor(min_len_original_groups/len_subgroup)
    )
    
    NewGroups_subLen_len_num[[len_subgroup]] <- new_group_params
  }
  
  out_list <- list()
  ind <- 1
  for (e in 1:length(NewGroups_subLen_len_num)){
    NewGroup_subLen_len_num <- NewGroups_subLen_len_num[[e]]
    
    elem_list <- list()
    ind <- 1
    # print(ind)
    for (o in 1:length(original_groups)){
      original_group <- original_groups[[o]]

      elem_list[[paste("group",ind)]] <- permutations(original_group,NewGroup_subLen_len_num[1])
      ind <- ind+1
    }
    
    out_list[[paste(c("subLen","len","numGroups"), NewGroup_subLen_len_num, collapse = " ")]] <- elem_list
  }
  
  results_list <- list()
  
  config_test <<- NewGroups_subLen_len_num
  
  for (config_id in 1:length(NewGroups_subLen_len_num)){
    config <- NewGroups_subLen_len_num[[config_id]]
    
    perm_grid <- expand.grid(out_list[[config_id]])
    
    perm_grid <- split(perm_grid,1:nrow(perm_grid)) 
    
    perm_grid <- lapply(perm_grid,unlist)
    
    perm_grid <- lapply(perm_grid,as.character)
    
    perm_grid <- do.call(rbind, perm_grid)
  
    new_groups <- new_groups_list(perm_grid,verbose = F) 
    
    new_groups <- find_matrix(new_groups)
    
    # config_test <<- config
    new_groups <- new_groups[which(sapply(new_groups, nrow) >= config[3])]
    
    # out_test <<- new_groups
    results_list[[config_id]] <- new_groups
    
  }
  
  return(results_list)
  
}

Given a simple input, like

original_groups <- list(
  group_1 = as.character(1:2), 
  group_2 = as.character(3:4), 
  group_3 = as.character(5:7)
)

The output of create_new_groups(original_groups) is

> create_new_groups_modified(original_groups)
[[1]]
[[1]][[1]]
     [,1] [,2] [,3]
[1,] "1"  "3"  "5" 
[2,] "2"  "4"  "6" 

[[1]][[2]]
     [,1] [,2] [,3]
[1,] "1"  "3"  "5" 
[2,] "2"  "4"  "7" 

[[1]][[3]]
     [,1] [,2] [,3]
[1,] "2"  "3"  "5" 
[2,] "1"  "4"  "6" 

[[1]][[4]]
     [,1] [,2] [,3]
[1,] "2"  "3"  "5" 
[2,] "1"  "4"  "7" 

[[1]][[5]]
     [,1] [,2] [,3]
[1,] "1"  "4"  "5" 
[2,] "2"  "3"  "6" 

[[1]][[6]]
     [,1] [,2] [,3]
[1,] "1"  "4"  "5" 
[2,] "2"  "3"  "7" 

[[1]][[7]]
     [,1] [,2] [,3]
[1,] "2"  "4"  "5" 
[2,] "1"  "3"  "6" 

[[1]][[8]]
     [,1] [,2] [,3]
[1,] "2"  "4"  "5" 
[2,] "1"  "3"  "7" 

[[1]][[9]]
     [,1] [,2] [,3]
[1,] "1"  "3"  "6" 
[2,] "2"  "4"  "7" 

[[1]][[10]]
     [,1] [,2] [,3]
[1,] "2"  "3"  "6" 
[2,] "1"  "4"  "7" 

[[1]][[11]]
     [,1] [,2] [,3]
[1,] "1"  "4"  "6" 
[2,] "2"  "3"  "7" 

[[1]][[12]]
     [,1] [,2] [,3]
[1,] "2"  "4"  "6" 
[2,] "1"  "3"  "7"

Moreover, the create_new_groups function also creates a global variable config_test where all possible configurations, for a given list of groups (i.e., original_groups), are stored. For example, for the previous problem, config_test is equal to

> config_test
[[1]]
[1] 1 3 2

So, for this problem only one output configuration is possible, having this structure:

the number of elements taken from each original group and used in each output group is 1
the length of the output groups is 6
the number of output groups (in each possible combination) is 2

Given a slightly more complex example

original_groups <- list(
  group_1 = as.character(1:4), 
  group_2 = as.character(5:8), 
  group_3 = as.character(9:13)
)

config_test would be equal to

> config_test
[[1]]
[1] 1 3 4

[[2]]
[1] 2 6 2

I made some tests, this method should work for any number of groups, of any length, and the output should always be composed of not-duplicated matrices.

I'm sorry if the explanation is short, if I have time in the following days I'll try to add some notes.

EDIT

A simple way to output only the configurations characterized by a specific number of elements from the original groups is to change the create_new_groups as follows

create_new_groups_modified <- function(original_groups, max_output = NULL, elements_from_original = NULL){

  min_len_original_groups = min(lengths(original_groups))

  num_original_groups = length(original_groups)
  
  max_len_subgroup <- floor(min_len_original_groups/2)
  
  
  if(min_len_original_groups<2){
    stop("Not possible to populate new groups: at least one original group has less than 2 elements")
  }
  
  
  NewGroups_subLen_len_num <- list()
  
  for (len_subgroup in 1:max_len_subgroup){
    
    new_group_params <- c(
      len_subgroup,
      len_subgroup*num_original_groups,
      floor(min_len_original_groups/len_subgroup)
    )
    
    NewGroups_subLen_len_num[[len_subgroup]] <- new_group_params
  }
  
  out_list <- list()
  ind <- 1
  for (e in 1:length(NewGroups_subLen_len_num)){
    NewGroup_subLen_len_num <- NewGroups_subLen_len_num[[e]]
    
    elem_list <- list()
    ind <- 1
    # print(ind)
    for (o in 1:length(original_groups)){
      original_group <- original_groups[[o]]

      elem_list[[paste("group",ind)]] <- permutations(original_group,NewGroup_subLen_len_num[1])
      ind <- ind+1
    }
    
    out_list[[paste(c("subLen","len","numGroups"), NewGroup_subLen_len_num, collapse = " ")]] <- elem_list
  }
  
  results_list <- list()
  
  config_test <<- NewGroups_subLen_len_num
  
  # if `elements_from_original` is not set, output all possible combinations
  
  if(is.null(elements_from_original)){
    for (config_id in 1:length(NewGroups_subLen_len_num)){
      config <- NewGroups_subLen_len_num[[config_id]]
      
      perm_grid <- expand.grid(out_list[[config_id]])
      
      perm_grid <- split(perm_grid,1:nrow(perm_grid)) 
      
      perm_grid <- lapply(perm_grid,unlist)
      
      perm_grid <- lapply(perm_grid,as.character)
      
      perm_grid <- do.call(rbind, perm_grid)
    
      new_groups <- new_groups_list(perm_grid,verbose = F) 
      
      new_groups <- find_matrix(new_groups)
      
      # config_test <<- config
      new_groups <- new_groups[which(sapply(new_groups, nrow) >= config[3])]
      
      # out_test <<- new_groups
      results_list[[config_id]] <- new_groups
    }
    
  } else {
    
    # if `elements_from_original` is set, check if this is a valid configuration, then output only the matrix having this configuration
    
    config_id <- which(sapply(NewGroups_subLen_len_num,function(x) x[1]==elements_from_original))
    
    if (length(config_id)!=0){

      config <- NewGroups_subLen_len_num[[config_id]]
      
      perm_grid <- expand.grid(out_list[[config_id]])
      
      perm_grid <- split(perm_grid,1:nrow(perm_grid)) 
      
      perm_grid <- lapply(perm_grid,unlist)
      
      perm_grid <- lapply(perm_grid,as.character)
      
      perm_grid <- do.call(rbind, perm_grid)
    
      new_groups <- new_groups_list(perm_grid,verbose = F) 
      
      new_groups <- find_matrix(new_groups)
      
      new_groups <- new_groups[which(sapply(new_groups, nrow) >= config[3])]
      
      results_list[[1]] <- new_groups

    } else {
      stop("value of elements_from_original not available: check config_test to see available configurations")
    }
    
  }
  
  return(results_list)
  
}

The elements_from_original parameter of the function allows to set the number elements from the original groups to consider, and, if applicable, the output will include only matrices following that configuration.

EDIT 2

To output the matrices composed by a specific number of groups

Write a new function select_matrices_by_number_output_groups, that outputs only the matrices with n_output_groups rows

select_matrices_by_number_output_groups <- function(l,n_output_groups){
  
  # Filter out matrices having less rows than `n_output_groups`
  out_l <- l[which(
    sapply(
      l,
      # function(x) check_matrix_by_number_output_groups(x,n_output_groups)
      function(mtr){
        if(nrow(mtr)<n_output_groups) return(F)
        else return(T)
      }
    )
  )]
  
  # Cut-off rows from matrices having more rows than `n_output_groups`
  out_l <- lapply(
    out_l,
    function(x) head(x,n_output_groups)
  )
  
  # Keep only unique elements (i.e., matrices)
  out_l <- unique(out_l)
  
  return(out_l)
  
}

The update create_new_groups so that it includes the select_matrices_by_number_output_groups function

create_new_groups_modified_2 <- function(original_groups, max_output = NULL, elements_from_original = NULL, n_output_groups = NULL){

  min_len_original_groups = min(lengths(original_groups))

  num_original_groups = length(original_groups)
  
  max_len_subgroup <- floor(min_len_original_groups/2)
  
  
  if(min_len_original_groups<2){
    stop("Not possible to populate new groups: at least one original group has less than 2 elements")
  }
  
  
  NewGroups_subLen_len_num <- list()
  
  for (len_subgroup in 1:max_len_subgroup){
    
    new_group_params <- c(
      len_subgroup,
      len_subgroup*num_original_groups,
      floor(min_len_original_groups/len_subgroup)
    )
    
    NewGroups_subLen_len_num[[len_subgroup]] <- new_group_params
  }
  
  out_list <- list()
  ind <- 1
  for (e in 1:length(NewGroups_subLen_len_num)){
    NewGroup_subLen_len_num <- NewGroups_subLen_len_num[[e]]
    
    elem_list <- list()
    ind <- 1
    # print(ind)
    for (o in 1:length(original_groups)){
      original_group <- original_groups[[o]]

      elem_list[[paste("group",ind)]] <- permutations(original_group,NewGroup_subLen_len_num[1])
      ind <- ind+1
    }
    
    out_list[[paste(c("subLen","len","numGroups"), NewGroup_subLen_len_num, collapse = " ")]] <- elem_list
  }
  
  results_list <- list()
  
  config_test <<- NewGroups_subLen_len_num
  
  # if `elements_from_original` is not set, output all possible combinations
  
  if(is.null(elements_from_original)){
    for (config_id in 1:length(NewGroups_subLen_len_num)){
      
      config <- NewGroups_subLen_len_num[[config_id]]
      
      perm_grid <- expand.grid(out_list[[config_id]])
      
      perm_grid <- split(perm_grid,1:nrow(perm_grid)) 
      
      perm_grid <- lapply(perm_grid,unlist)
      
      perm_grid <- lapply(perm_grid,as.character)
      
      perm_grid <- do.call(rbind, perm_grid)
    
      new_groups <- new_groups_list(perm_grid,verbose = F) 
      
      new_groups <- find_matrix(new_groups)
      
      new_groups <- new_groups[which(sapply(new_groups, nrow) >= config[3])]
      
      results_list[[config_id]] <- new_groups
    }
    
  } else {
    
    # if `elements_from_original` is set, check if this is a valid configuration, then output only the matrix having this configuration
    
    config_id <- which(sapply(NewGroups_subLen_len_num,function(x) x[1]==elements_from_original))
    
    if (length(config_id)!=0){

      config <- NewGroups_subLen_len_num[[config_id]]
      
      perm_grid <- expand.grid(out_list[[config_id]])
      
      perm_grid <- split(perm_grid,1:nrow(perm_grid)) 
      
      perm_grid <- lapply(perm_grid,unlist)
      
      perm_grid <- lapply(perm_grid,as.character)
      
      perm_grid <- do.call(rbind, perm_grid)
      
      new_groups <- new_groups_list(perm_grid,verbose = F) 
      
      new_groups <- find_matrix(new_groups)
      
      
      new_groups <- lapply(
        new_groups, 
        function(x) {
          dimnames(x) <- NULL
          return(x)
        }
      )
      
      if(is.null(n_output_groups)){
        
        new_groups <- new_groups[which(sapply(new_groups, nrow) == config[3])]
        
      } else if (n_output_groups > config[3]){
        
        stop("value n_output_groups higher than max number of new groups for this configuration: check config_test to see available configurations")
        
      } else {
        
        new_groups <- select_matrices_by_number_output_groups(new_groups,n_output_groups)
        
      }
      
      # results_list[[1]] <- new_groups
      results_list <- new_groups

    } else {
      
      stop("value of elements_from_original not available: check config_test to see available configurations")
      
    }
    
  }
  
  return(results_list)
  
}

This answer looks amazing, but where do you specify the number of items from each original group that end up in each final group (`Number_of_Items_From_Each_Original_Group_to_End_up_in_Each_New_Group` in my original question)? — David Moore, Jul 17 '22 at 11:49
The final list includes all possibile `Number_of_Items_From_Each_Original_Group_to_End_up_in_Each_New_Group`, just a few slight changes are required to output only one specific value. Indeed, the `create_new_groups` function also creates a global variable where the "configurations" for the output are possible given an input. I'll add an explanation on how to do that — Edoch, Jul 17 '22 at 15:26
Moreover, notice that the output of `create_new_groups` is a list of lists, where each element of the outer list corresponds to a possible configuration (which, among the other parameters, is characterized by the number of elements from the original groups and used by the new groups), and the inner lists include all possible combination matrices given a configuration — Edoch, Jul 17 '22 at 16:21
This solution is outstanding but I have one comment. If there are 4 items in each original group and I specify that only 1 item from each original group should end up in each new group, by default, your function returns 4 new groups, and each new group contains 1 item from each original group. What if I want only 2 or 3 new groups? It seems like by default your function returns the maximum possible number of new groups. I'd love to be able to specify how many new groups I want. — David Moore, Jul 17 '22 at 17:53
I updated my question to include varying the number of new groups and I'll add another bounty as well after this one expires — David Moore, Jul 17 '22 at 18:08
A simple solution is to apply a function to the output list to cut off some rows in each output matrix, something like `lapply(output_list[[config_id]],function(x) x[1:n_groups,])`. A more complex solution would require a much deeper rework of the `create_new_groups` function, that I think might take a long time — Edoch, Jul 17 '22 at 21:45
This solution should work because what you are asking is a subset of the output the function is already returning — Edoch, Jul 17 '22 at 21:57
Not quite - there will be more combinations; you would have to take combinations of the rows of each of the matrices returned by your function, I think. I'll try to spend some time digging into what you've done tomorrow afternoon and I appreciate your help. — David Moore, Jul 17 '22 at 22:40
I see what you mean, check the second edit of my solution and let me know if the output is correct — Edoch, Jul 18 '22 at 09:32

Andrew Gustar · Answer 2 · 2022-07-08T17:28:38.160

As mentioned in the comment, there are potentially huge numbers of combinations involved here. However, one approach that works (assuming you have enough time/memory) is as follows. This example is for just the first two elements of your original_groups list, allowing two elements from each group. It would be straightforward to generalise the final map to arbitrary numbers of groups, but this is just to illustrate the principle.

The first map sets up a vector of group indices, padded out with NAs to the length of the group (6) (i.e in this case c(1,2,1,2,NA,NA)), and finds all unique permutations of it. cross then combines each option for the first group with each option for the second, and the final map uses these indices to separate the elements into the two groups.

library(combinat) #for permn function
library(tidyverse) #purrr and dplyr

original_groups <- list(group_1 = as.character(1:6), group_2 = as.character(7:12))

no_items <- 2
no_groups <- length(original_groups) #i.e. 2 in this case

output <- map(original_groups, 
              ~unique(permn(`length<-`(rep(seq_len(no_groups), 
                                           no_items),
                                       length(.))))) %>% 
  cross() %>% 
  map(~list(c(original_groups$group_1[which(.$group_1 == 1)], 
              original_groups$group_2[which(.$group_2 == 1)]),
            c(original_groups$group_1[which(.$group_1 == 2)], 
              original_groups$group_2[which(.$group_2 == 2)])))

head(output) #full output has 8100 elements
[[1]]
[[1]][[1]]
[1] "1" "3" "7" "9"

[[1]][[2]]
[1] "2"  "4"  "8"  "10"


[[2]]
[[2]][[1]]
[1] "1" "3" "7" "9"

[[2]][[2]]
[1] "2"  "5"  "8"  "10"


[[3]]
[[3]][[1]]
[1] "1" "4" "7" "9"

[[3]][[2]]
[1] "2"  "5"  "8"  "10"


[[4]]
[[4]][[1]]
[1] "1" "4" "7" "9"

[[4]][[2]]
[1] "3"  "5"  "8"  "10"


[[5]]
[[5]][[1]]
[1] "2" "4" "7" "9"

[[5]][[2]]
[1] "3"  "5"  "8"  "10"


[[6]]
[[6]][[1]]
[1] "2" "4" "7" "9"

[[6]][[2]]
[1] "3"  "6"  "8"  "10"

This answer is really nice, but I'm hoping to find something that works for any number of groups and not just for two (you have a few lines where you specify what `group_1` and `group_2` are). I know I can't be too picky, but I'd also love to see a solution using only `base` functions too. — David Moore, Jul 10 '22 at 18:38

Jeffrey Harkness · Answer 3 · 2022-07-17T21:54:25.747

The prototype function below works for any number of groups from which some number ("numobs" in the function) of observations have been uniquely drawn. I have added an example with four groups instead of 3 to illustrate.

The function was written in R version 4.2.1 and data.table version 1.14.2 on a windows operating system. The code is rather sophisticated; changing the position of a single comma, or a single compatibility issue, might render the function inoperable.

In the example given by the poster, there are 15 ways of drawing two distinct members from the first two groups (length 6) and 28 ways of drawing two distinct members from the third group (length 8). There should then be 15 * 15 * 28 = 6300 possible combinations.

I have added two other examples to show that this function works for any given number of groups with k objects taken from each group. However, the code speed is affected as the number of groups and objects increases, and will likely become an issue for large numbers of groups or objects.

Suggested Citation: Harkness, Jeffrey (2022). Customized R combination function. Posted 7/14/2022 on stackoverflow.com

##original example from poster
(original_groups <- list(group_1 = as.character(1:6), 
group_2 = as.character(7:12), group_3 = as.character(13:20)))
    $group_1
[1] "1" "2" "3" "4" "5" "6"

$group_2
[1] "7"  "8"  "9"  "10" "11" "12"

$group_3
[1] "13" "14" "15" "16" "17" "18" "19" "20"

#testing 4 groups instead of 3
(original_groups2 <- list(group_1 = as.character(1:6), 
group_2 = as.character(7:12), group_3 = as.character(13:20), group_4 = as.character(21:24)))
$group_1
[1] "1" "2" "3" "4" "5" "6"

$group_2
[1] "7"  "8"  "9"  "10" "11" "12"

$group_3
[1] "13" "14" "15" "16" "17" "18" "19" "20"

$group_4
[1] "21" "22" "23" "24"

#testing 4 different groups
(original_groups3 <- list(group_1 = as.character(1:4), 
group_2 = as.character(5:9), group_3 = as.character(10:16), group_4 = as.character(17:22)))
$group_1
[1] "1" "2" "3" "4"

$group_2
[1] "5" "6" "7" "8" "9"

$group_3
[1] "10" "11" "12" "13" "14" "15" "16"

$group_4
[1] "17" "18" "19" "20" "21" "22"

require(data.table)
# Loading required package: data.table
# data.table 1.14.2 using 4 threads (see ?getDTthreads).  Latest news: r-datatable.com
gc <- function(input_ob = original_groups, numgroups = length(input_ob), numobs = 2) {
  ansdim <- NULL
  for (k in 1:numgroups) {
    rowcount <- 1
    temp <- as.vector(input_ob[[k]]) # temporary vector for group k
    newcombs <- data.frame(combn(temp, numobs)) # data frame of combinations for group k
    newcombs <- transpose(newcombs)
    newname <- paste0("ansnew", k)
    ansdim[k] <- dim(newcombs)[1]
    assign(newname, newcombs)
  }

  ### To hold final answer
  nm <- matrix(data = 0, nrow = prod(ansdim), ncol = numobs * numgroups, byrow = T)

  ## All possible combinations of the first two groups
  combine <- NULL
  nc <- ansdim[1]
  nc <- data.frame(CJ(1:nc, 1:ansdim[2])) # instructions for which row numbers to combine
  ntemp <- matrix(data = 0, nrow = ansdim[1] * ansdim[2], ncol = numobs * 2, byrow = T)
  for (m in 1:dim(ntemp)[1]) {
    newrow <- cbind(ansnew1[nc[m, 1], ], ansnew2[nc[m, 2], ])
    ntemp[m, ] <- as.matrix(newrow[1, ])
  }
  fcom <- ntemp ### all combinations of first two groups

  ## All possible combinations of all groups
  for (n in 3:(numgroups)) {
    nc <- ansdim[n]
    nc <- data.frame(CJ(1:nrow(fcom), 1:ansdim[n])) # instructions for which row numbers to combine
    ntemp <- matrix(data = 0, nrow = nrow(fcom) * ansdim[n], ncol = numobs * (n), byrow = T)
    for (p in 1:dim(ntemp)[1]) {
      frow <- fcom[nc[p, 1], ] ### First part of new row
      srow <- as.character(as.vector(get(paste0("ansnew", n))[nc[p, 2], ])) ## second part of new row
      newrow <- c(frow, srow)
      ntemp[p, ] <- newrow
    }
    fcom <- ntemp
  }
  #
  nm <- fcom
  return(nm)
}

ans1 <- gc(input_ob = original_groups)
ans2 <- gc(input_ob = original_groups2)
ans3 <- gc(input_ob = original_groups3)

 
dim(ans1);head(ans1)
[1] 6300    6
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] "1"  "2"  "7"  "8"  "13" "14"
[2,] "1"  "2"  "7"  "8"  "13" "15"
[3,] "1"  "2"  "7"  "8"  "13" "16"
[4,] "1"  "2"  "7"  "8"  "13" "17"
[5,] "1"  "2"  "7"  "8"  "13" "18"
[6,] "1"  "2"  "7"  "8"  "13" "19"
dim(ans2);head(ans2)
[1] 37800     8
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "1"  "2"  "7"  "8"  "13" "14" "21" "22"
[2,] "1"  "2"  "7"  "8"  "13" "14" "21" "23"
[3,] "1"  "2"  "7"  "8"  "13" "14" "21" "24"
[4,] "1"  "2"  "7"  "8"  "13" "14" "22" "23"
[5,] "1"  "2"  "7"  "8"  "13" "14" "22" "24"
[6,] "1"  "2"  "7"  "8"  "13" "14" "23" "24"
dim(ans3);head(ans3)
[1] 18900     8
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "1"  "2"  "5"  "6"  "10" "11" "17" "18"
[2,] "1"  "2"  "5"  "6"  "10" "11" "17" "19"
[3,] "1"  "2"  "5"  "6"  "10" "11" "17" "20"
[4,] "1"  "2"  "5"  "6"  "10" "11" "17" "21"
[5,] "1"  "2"  "5"  "6"  "10" "11" "17" "22"
[6,] "1"  "2"  "5"  "6"  "10" "11" "18" "19"
 
###Sample random rows from output to illustrate output is consistent
ans1[sample(nrow(ans1), size = 5, replace = F),]
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] "1"  "3"  "8"  "12" "13" "17"
[2,] "3"  "6"  "8"  "12" "19" "20"
[3,] "1"  "3"  "7"  "11" "15" "19"
[4,] "1"  "2"  "9"  "11" "17" "20"
[5,] "5"  "6"  "7"  "8"  "19" "20"
ans2[sample(nrow(ans2), size = 5, replace = F),]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "2"  "3"  "8"  "11" "13" "15" "22" "24"
[2,] "1"  "2"  "8"  "9"  "13" "19" "23" "24"
[3,] "1"  "3"  "11" "12" "13" "15" "21" "22"
[4,] "1"  "4"  "9"  "10" "14" "16" "21" "22"
[5,] "2"  "5"  "7"  "9"  "15" "20" "22" "24"
ans3[sample(nrow(ans3), size = 5, replace = F),]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "3"  "4"  "6"  "7"  "10" "15" "21" "22"
[2,] "1"  "3"  "5"  "8"  "12" "14" "18" "22"
[3,] "2"  "4"  "7"  "8"  "12" "16" "18" "19"
[4,] "2"  "3"  "5"  "9"  "11" "13" "17" "21"
[5,] "1"  "2"  "7"  "8"  "14" "16" "17" "18"
 
##shows that the output has no duplicated rows
ans1[duplicated(ans1),]
     [,1] [,2] [,3] [,4] [,5] [,6]

In order to show convincingly that all the cases are represented in the output, I wrote a few lines of code that randomly sample two objects from the OP's example groups and finds the corresponding row in the output. This code can easily be looped over.

#The next lines sample from original group and find corresponding row in answer
samp1 = sample(original_groups$group_1,size = 2, replace = F)
samp2 = sample(original_groups$group_2,size = 2, replace = F)
samp3 = sample(original_groups$group_3,size = 2, replace = F)
(samprow = c(samp1, samp2, samp3))
[1] "6"  "4"  "8"  "9"  "20" "18"
colnames(ans1) = c("v1","v2","v3","v4","v5","v6")
ans5 = data.frame(ans1)
ans5[ans5$v1 %in% samprow & ans5$v2 %in% samprow &ans5$v3 %in% samprow &ans5$v4 %in% samprow &ans5$v5 %in% samprow &ans5$v6 %in% samprow,]
     v1 v2 v3 v4 v5 v6
5627  4  6  8  9 18 20

7/16/2022 Update: Generating all combinations

The following approach was written for the case of 3 groups. The code would require modest modification to output the data in whatever form the user wants. It can be altered to accommodate other group sizes, periodically dump data into an output file to avoid size issues, etc., but I will not do so here for the sake of brevity. Rough calculations suggest that for the example data in "original_groups", it should work in about two hours or less for a reasonably fast laptop. It has been commented out for this reason.

Every unique sampled set (the output from gc()) is represented by a single integer in the code, so they don't have to be copied or repeated over and over. This helps to mitigate data size issues.

The code below writes a data.table where each row gives 3 integers that correspond to three unique sampled sets, like a set of instructions that can be used to construct the final result in whatever form (list, etc.) the user wants with additional code.

After running the code for about 15 minutes, the answer frame had collected 697,152 combinations after removing duplicates. The duplicate removal code is taken from here. All other code is original. Code and example output are shown below.

References:

Stack Overflow commentator Thomas (2014). Accessed 7/16/2022. URL: Deleting reversed duplicates with R

   # inans = ans1                 #object from gc() function with all combinations
# inans = data.table(inans)    
# colvec = colnames(inans)
# inans$index = 1:nrow(inans)
# 
# for(i in 1:nrow(inans)){                 #For every possible sampled sequence
#   sf = NULL;sf2 = NULL;ntemp = NULL; nin = NULL
#   ntemp = as.character(as.vector(inans[i,..colvec])) 
#   nin = inans$index[i]
#   sf = inans
#   for(k in colvec){                        #Remove rows that don't match first element
#     sf = sf[!get(k) %in% ntemp]
#   }
#   
#   ntemp2 = NULL; sin = NULL
#   for(j in 1:nrow(sf)){                    #Loop through all possibilities for element 
#     ntemp2 = as.character(as.vector(sf[j,..colvec])) 
#     sin = sf$index[j]
#     sf2 = sf
#     for(m in colvec){
#       sf2 = sf2[!get(m) %in% ntemp2]          #Remove rows that don't match second element
#     }
#     if(dim(sf2)[1] > 0){
#       newframe = data.table(nin, sin, sf2$index)  #Store row numbers for each combination
#       if(i == 1){
#         ansframe = newframe
#       }else{
#         ansframe = rbind(ansframe, newframe)       #start process over for next sampled set
#       }}}}
# 
# ##Remove duplicate groups
# ##Code Source: 
# ##https://stackoverflow.com/questions/22756392/deleting-reversed-duplicates-with-r
# ansframe = ansframe[!duplicated(apply(ansframe,1,function(x) paste(sort(x),collapse=''))),]
# 
# #input data.table of all possible sample sets
# #6300 total here - made from gc() function above
# #index is an integer to represent each sequence
# head(inans)
#    v1 v2 v3 v4 v5 v6   index
# 1:  1  2  7  8 13 14     1
# 2:  1  2  7  8 13 15     2
# 3:  1  2  7  8 13 16     3
# 4:  1  2  7  8 13 17     4
# 5:  1  2  7  8 13 18     5
# 6:  1  2  7  8 13 19     6
# 
# 
# #data.table that holds instructions to build final output
# #each row gives row numbers for a feasible combination
# head(ansframe)
# nin  sin   V3
# 1:   1 6300 4046
# 2:   1 6300 4047
# 3:   1 6300 4048
# 4:   1 6300 4051
# 5:   1 6300 4052
# 6:   1 6300 4055
# 
# #answer frame size: each row represents a combination
# dim(ansframe)
# [1] 697152      3
# 
# 
# ###Code to print a selection of valid combinations from the 
# ###output instructions given by the code above.
# forind = as.integer(seq(1,nrow(ansframe),length.out = 50))
# for(i in forind){print(inans[as.numeric(ansframe[i])])}
#    v1 v2 v3 v4 v5 v6 index
# 1:  1  2  7  8 13 14     1
# 2:  5  6 11 12 19 20  6300
# 3:  3  4  9 10 15 16  4046
#    v1 v2 v3 v4 v5 v6 index
# 1:  1  2  7  8 13 15     2
# 2:  3  4  9 11 16 17  4079
# 3:  5  6 10 12 14 20  6257
#    v1 v2 v3 v4 v5 v6 index
# 1:  1  2  7  8 13 15     2
# 2:  3  4  9 12 17 20  4113
# 3:  5  6 10 11 18 19  6242
# v1 v2 v3 v4 v5 v6 index
# 1:  1  2  7  8 13 15     2
# 2:  3  4 10 12 14 19  4156
# 3:  5  6  9 11 16 18  6180
#    v1 v2 v3 v4 v5 v6 index
# 1:  1  2  7  8 13 15     2
# 2:  3  4 11 12 17 18  4195
# 3:  5  6  9 10 16 20  6154
#    v1 v2 v3 v4 v5 v6 index
# 1:  1  2  7  8 13 15     2
# 2:  3  5  9 11 14 17  4490
# 3:  4  6 10 12 16 18  5844
# v1 v2 v3 v4 v5 v6 index
# 1:  1  2  7  8 13 15     2
# 2:  3  5  9 12 16 19  4529
# 3:  4  6 10 11 17 18  5819
#    v1 v2 v3 v4 v5 v6 index
# 1:  1  2  7  8 13 15     2
# 2:  3  5 10 11 18 20  4563
# 3:  4  6  9 12 17 19  5792

7/17/2022 Update: All combinations with variable output group size

The following approach was written for the cases of 2,3,or 4 groups. In tested cases, the code ran in the rough ballpark of 15 minutes or so for two groups but can take longer for 4 groups, depending on the size and number of the input groups.

This example shows 4 input groups with 4 output groups and 1 object from each group, but the user can also choose 2 or 3 output groups. The object "numobs" in the gc() function specifies objects to take from each group and the "groupsout" object specifies the number of output groups. The code below found 13,824 combinations in under ten minutes for this example.

(original_groups4 <- list(group_1 = as.character(1:4), group_2 = as.character(5:8), group_3 = as.character(9:12), group_4 = as.character(13:16)))
$group_1
[1] "1" "2" "3" "4"

$group_2
[1] "5" "6" "7" "8"

$group_3
[1] "9"  "10" "11" "12"

$group_4
[1] "13" "14" "15" "16"

ans1 = gc(input_ob = original_groups4, numobs = 1)
 
###Find all combinations 
###groupsout specifies the number of output groups
groupsout = 4    #Number of groups in the output: choose 2,3, or 4
inans = ans1     #dev object from gc() function with all combos
 
inans = data.table(inans)  #order data.table by first few columns
colvec = colnames(inans)
inans$index = 1:nrow(inans)
 
if(groupsout == 2){          #groupsout = 2 case
for(i in 1:nrow(inans)){     #Remove rows that don't match first element
ntemp = as.character(as.vector(inans[i,..colvec])) 
nin = inans$index[i]
sf = inans
for(k in colvec){
sf = sf[!get(k) %in% ntemp]
}
if(dim(sf)[1] > 0){
newframe = data.table(nin, sf$index)   #Store row numbers for each combination
if(i == 1){
ansframe = newframe
}else{
ansframe = rbind(ansframe, newframe)
}}}}
 
if(groupsout == 3){          #groupsout = 3 case
for(i in 1:nrow(inans)){     #Remove rows that don't match first element
sf = NULL;sf2 = NULL;ntemp = NULL; nin = NULL
ntemp = as.character(as.vector(inans[i,..colvec])) 
nin = inans$index[i]
sf = inans
for(k in colvec){
sf = sf[!get(k) %in% ntemp]
}
 
ntemp2 = NULL; sin = NULL  #Create last group and write to output frame
for(j in 1:nrow(sf)){      #Remove rows that don't match second element
ntemp2 = as.character(as.vector(sf[j,..colvec])) 
sin = sf$index[j]
sf2 = sf
for(m in colvec){
sf2 = sf2[!get(m) %in% ntemp2]
}
if(dim(sf2)[1] > 0){
newframe = data.table(nin, sin, sf2$index)   #Store row numbers for each combination
if(i == 1){
ansframe = newframe
}else{
ansframe = rbind(ansframe, newframe)
}}}}}

###groupsout = 4 case
if(groupsout == 4){
for(i in 1:nrow(inans)){     #Remove rows that don't match first element
sf = NULL;sf2 = NULL;sf3 = NULL
ntemp = NULL; nin = NULL;fin = NULL
ntemp = as.character(as.vector(inans[i,..colvec])) 
nin = inans$index[i]
sf = inans
for(k in colvec){
sf = sf[!get(k) %in% ntemp]
}
####Create next group
ntemp2 = NULL; sin = NULL
for(j in 1:nrow(sf)){      #Remove rows that don't match second element
ntemp2 = as.character(as.vector(sf[j,..colvec])) 
sin = sf$index[j]
sf2 = sf
for(m in colvec){
sf2 = sf2[!get(m) %in% ntemp2]
}
####Create last group and write to output frame
fin = NULL;ntemp3 = NULL
for(p in 1:nrow(sf2)){      #Remove rows that don't match third element
ntemp3 = as.character(as.vector(sf2[p,..colvec])) 
fin = sf2$index[p]
sf3 = sf2
for(r in colvec){
sf3 = sf3[!get(r) %in% ntemp3]
}
if(dim(sf3)[1] > 0){
newframe = data.table(nin, sin,fin, sf3$index)   #Store row numbers
if(i == 1){
ansframe = newframe
}else{
ansframe = rbind(ansframe, newframe)
}}}}}}#end if groupsout = 4
 
##Remove duplicate groups
ansframe = ansframe[!duplicated(apply(ansframe,1,function(x) paste(sort(x),collapse=''))),]
 
#for illustration
dim(ansframe)
[1] 13824     4
 
##View subset of combination output to check for valid results
forind = as.integer(seq(1,nrow(ansframe),length.out = 50))
for(i in forind){print(inans[as.numeric(ansframe[i])])}
   V1 V2 V3 V4 index
1:  1  5  9 13     1
2:  4  8 12 16   256
3:  3  7 11 15   171
4:  2  6 10 14    86
   V1 V2 V3 V4 index
1:  1  5  9 15     3
2:  2  6 12 16    96
3:  3  7 10 14   166
4:  4  8 11 13   249
   V1 V2 V3 V4 index
1:  1  5  9 16     4
2:  2  7 12 14   110
3:  3  6 11 15   155
4:  4  8 10 13   245
   V1 V2 V3 V4 index
1:  1  5 10 13     5
2:  2  8 12 14   126
3:  3  7  9 16   164
4:  4  6 11 15   219
   V1 V2 V3 V4 index
1:  1  5 10 15     7
2:  2  6 11 16    92
3:  3  8 12 14   190
4:  4  7  9 13   225
   V1 V2 V3 V4 index
1:  1  5 10 16     8
2:  2  7 11 15   107
3:  3  6  9 14   146
4:  4  8 12 13   253
   V1 V2 V3 V4 index
1:  1  5 11 13     9
2:  2  8 10 15   119
3:  3  6 12 16   160
4:  4  7  9 14   226

I really appreciate the answer, it's only partway there. Check out my `new_groups` objects in the question. In my example, three of your output groups were reported together, and each group of three groups contained no repeated items. Here, you just report all possible combinations of items when a certain number are taken from each group. — David Moore, Jul 14 '22 at 17:49
If I understand your comment correctly, then from the function I wrote, you seek all possible combinations of rows of N groups (however many groups were given) that contain no repeated elements. — Jeffrey Harkness, Jul 14 '22 at 18:38
Yes - I believe that’s correct. Check out my original question for an example. Thank you so much. — David Moore, Jul 14 '22 at 19:12
@David Moore, depending on your purposes, you might consider using the `gc()` function I wrote and then writing a function to randomly sample some given number of feasible combinations from its output instead of trying to generate all possible combinations stored in R objects. Finding all possible groups from `original_groups` given a sample size from each is a far more tractable problem than finding all possible groupings of N groups. I wrote some prototype code for the case of all N = 3 group combinations; the computation time and output are somewhat unwieldly. — Jeffrey Harkness, Jul 15 '22 at 19:29
Ultimately I do need to generate all possible combinations. Your answer looks really good but it takes a while to run - I'm just trying it out now. Thanks so much. — David Moore, Jul 17 '22 at 18:00
Is there a way to specify how many new groups to create? For example, if each original group contains 4 items and I only want one item from each original group to end up in each new group, each combination could have 2, 3, or 4 new groups. I'll update my question accordingly - I know this wasn't clear in it. — David Moore, Jul 17 '22 at 18:04
@David Moore, I have added code that handles cases for 2,3, or 4 output groups above. — Jeffrey Harkness, Jul 17 '22 at 22:06
I just updated my question to include another variable (`Number_of_New_Groups`) and I added a new bounty. Thanks! — David Moore, Jul 25 '22 at 13:13

Onyambu · Answer 4 · 2022-07-15T17:46:23.070

This is the job of expand.grid + combn: will be showing only the first 5 rows:

n <- 1
expand.grid(lapply(original_groups, combn, n, simplify = FALSE))

   group_1 group_2 group_3
1         1       7      13
2         2       7      13
3         3       7      13
4         4       7      13
5         5       7      13

when n = 2

n <- 2
expand.grid(lapply(original_groups, combn, n, simplify = FALSE))

   group_1 group_2 group_3
1      1, 2    7, 8  13, 14
2      1, 3    7, 8  13, 14
3      1, 4    7, 8  13, 14
4      1, 5    7, 8  13, 14
5      1, 6    7, 8  13, 14

You can write a simple function:

generate_all <- function(lst, n){
   expand.grid(lapply(lst, combn, n, simplify = FALSE))
 }

head(generate_all(original_groups, 3))
    group_1   group_2    group_3
1   1, 2, 3   7, 8, 9 13, 14, 15
2   1, 2, 4   7, 8, 9 13, 14, 15
3   1, 2, 5   7, 8, 9 13, 14, 15
4   1, 2, 6   7, 8, 9 13, 14, 15
5   1, 3, 4   7, 8, 9 13, 14, 15




head(generate_all(original_groups, 4))
     group_1     group_2        group_3
1 1, 2, 3, 4 7, 8, 9, 10 13, 14, 15, 16
2 1, 2, 3, 5 7, 8, 9, 10 13, 14, 15, 16
3 1, 2, 3, 6 7, 8, 9, 10 13, 14, 15, 16
4 1, 2, 4, 5 7, 8, 9, 10 13, 14, 15, 16
5 1, 2, 4, 6 7, 8, 9, 10 13, 14, 15, 16
6 1, 2, 5, 6 7, 8, 9, 10 13, 14, 15, 16

ThomasIsCoding · Answer 5 · 2022-07-18T06:38:52.427

If I understand your question correctly, it seems to be a problem regarding all possible partitions of each group by a given size and re-organizing the partitions across all groups to form new collections, and keep only one of the isomorphics. In this case, I guess the key step is to generate all exclusive partitions by size, which seems related to permutation problem.

Since a base R option is preferrable to OP, probably we can try the code below:

build a helper function permM, which generate all permutations of vector x with given group size M
create a function f to produce desired output, i.e., all possible combinations of new groups, where all combinations are stored in a nested list

# generate all permuations of x with given size M for each group
permM <- function(x, M) {
  if (length(x) == M) {
    return(list(x))
  }
  S <- combn(x, M, simplify = FALSE)
  res <- c()
  for (k in seq_along(S)) {
    z <- Recall(x[!x %in% S[[k]]], M)
    res <- c(res, lapply(z, c, S[[k]]))
  }
  res
}

# create all possible combinations of new groups
f <- function(lst, K) {
  nms <- names(lst)
  l <- lapply(lst, combn, m = length(lst) * K)
  g <- apply(
    expand.grid(lapply(choose(lengths(lst), length(lst) * K), seq)),
    1,
    function(idx) {
      Map(function(p, q) l[[p]][, q], seq_along(idx), unlist(idx))
    }
  )
  x <- do.call(
    c,
    lapply(
      g,
      function(v) {
        apply(
          expand.grid(lapply(v, permM, M = K)),
          1,
          function(...) {
            setNames(
              asplit(do.call(rbind, lapply(..., matrix, K)), 2),
              nms
            )
          }
        )
      }
    )
  )
  # remove the isomorphics but keep one of them only
  x[
    !duplicated(lapply(
      x,
      function(v) {
        unname(sort(sapply(v, function(z) toString(sort(z)))))
      }
    ))
  ]
}

Example

Given a smaller data sample lst <- list(grp1 = 1:4, grp2 = 5:9) as the original_group list, we run

r1 <- f(lst,1)
r2 <- f(lst,2)

and we will see a snapshot of result like below

> head(r1)
[[1]]
[[1]]$grp1
[1] 2 6

[[1]]$grp2
[1] 1 5


[[2]]
[[2]]$grp1
[1] 1 6

[[2]]$grp2
[1] 2 5


[[3]]
[[3]]$grp1
[1] 3 6

[[3]]$grp2
[1] 1 5


[[4]]
[[4]]$grp1
[1] 1 6

[[4]]$grp2
[1] 3 5


[[5]]
[[5]]$grp1
[1] 4 6

[[5]]$grp2
[1] 1 5


[[6]]
[[6]]$grp1
[1] 1 6

[[6]]$grp2
[1] 4 5

and

> head(r2)
[[1]]
[[1]]$grp1
[1] 3 4 7 8

[[1]]$grp2
[1] 1 2 5 6


[[2]]
[[2]]$grp1
[1] 2 4 7 8

[[2]]$grp2
[1] 1 3 5 6


[[3]]
[[3]]$grp1
[1] 2 3 7 8

[[3]]$grp2
[1] 1 4 5 6


[[4]]
[[4]]$grp1
[1] 1 4 7 8

[[4]]$grp2
[1] 2 3 5 6


[[5]]
[[5]]$grp1
[1] 1 3 7 8

[[5]]$grp2
[1] 2 4 5 6


[[6]]
[[6]]$grp1
[1] 1 2 7 8

[[6]]$grp2
[1] 3 4 5 6

This is an outstanding answer, but I have two comments. First, `r1[[1]]` and `r1[[4]]` are the same (`r1[[2]]` and `r1[[3]]` are too) - they're duplicated answers since the names or the order of the new groups doesn't matter. These duplicates shouldn't show up in the final output. — David Moore, Jul 18 '22 at 01:10
Second, is there a way to specify how many new groups there should be? For example, in my original question, the `original_groups` object had 6, 6, and 8 items in each original group, respectively. Thus, it would be possible to make 2 new groups that each contain 3 items from each original group, it'd be possible to make 2 new groups that each contain 2 items from each original group, it'd be possible to make 3 new groups that each contain 2 items from each original group, it'd be possible to make 3 new groups that each contain 1 item from each original group, and so on. — David Moore, Jul 18 '22 at 01:10
I know you have an option to specify how many items from each original group end up in each new group but I don't think there's a way to specify how many new groups there should be. — David Moore, Jul 18 '22 at 01:10
@DavidMoore It seems you want to remove the isomorphics but keep only one of them. Plz see the updae. For your second question, since we are creaing all the combos, you can pick any number of new groups as you need from the output, e.g., `head(r1, 2)`, `head(r1, 3)` or something like that (if I understood right) — ThomasIsCoding, Jul 18 '22 at 06:42
My second question was actually about the number of new groups created within each combination. In your example, `r2[[1]]` has 2 new groups. In theory, if we take 1 item from each original group in your example (using your `lst` object), we could make 2, 3, or 4 new groups within each combination accordingly. If we take 2 items from each original group, we could only make 2 new groups within each combination using your `lst` object, but we could make more if we use my more lengthy `original_groups` object as the starting point. I'm not interested in sampling the output. — David Moore, Jul 18 '22 at 21:27
@DavidMoore Could you explain how to make 2, 3, or 4 new groups within each combination accordingly? I am confused by your description and your question now. Perhaps you can use my example `lst` in your post to elaborate your goal or expected output. — ThomasIsCoding, Jul 18 '22 at 21:35
I am going to edit my question to make it very clear and I'll toss another bounty on it too - thanks again for the help — David Moore, Jul 25 '22 at 12:49

Determining All Possible Combinations of Items With a Grouping Variable, Allowing for Different Numbers of Items From Each Original Groups

5 Answers5

Example

Linked