How to write R function to create every subgroup based on multiple columns?

Question

I'm struggling to create a function in R that will take in a dataset and columns, and output every permutation of datasets filtered by all of these 3 columns.

My data set looks like

structure(list(name = c("Peter Doe", "John Gary", "Elsa Johnson", 
"Mary Poppins", "Jesse Bogart"), sex = c("Male", "Male", "Female", 
"Female", "Male"), class = c("Honors", "Core", "Core", "Honors", 
"Honors"), grade = c("A", "A", "A", "B", "C")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -5L))

I tried to visualize my goal here:

I was hoping to create new variables based on what path of this map it followed (e.g. male_honors_a <- dataset filtered by those column values) and I think I could do that with the paste function but am not sure here as well. More importantly though, I'm struggling with how to put for loops together inside the function that are able to filter based on the unique values of a column.

I got as far as to coding up a function that creates every subgroup individually but was not able to figure out how to put them together.

subgroups <- function(df, filters, group = "none", name = ""){
  listofdfs <- list()
  for (i in filters) {
    subgroups <- unique(df[[i]])
    for (j in subgroups){
      x <- df[df[i] == j,]
      listofdfs[[paste(name,j, sep = "")]] <- x
    }
  }
  if (group != "none"){
    return(listofdfs[[group]])
  }
  else {
  return(listofdfs)}
}

subgroups(df, c("sex", "class", "grade"))

I would hope by running subgroups(df, c("sex", "class")), my output would be a list of dataframes:

list(male_honors, male_core, female_honors, female_core)

in which the male_honors element is

# A tibble: 2 × 4
  name         sex   class  grade
1 Peter Doe    Male  Honors A    
2 Jesse Bogart Male  Honors C

Would really appreciate any help!

Welcome to Stack Overflow! I can tell you put a lot of thought & effort into this function. Can you please add one more feature: incorporate elements from [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Especially the aspects of using `dput()` for the input and then an explicit example of your expected dataset? — wibeasley, Jan 19 '22 at 04:38
Hi @wibeasley! Thank you for the welcome! I just tried adding your suggestions- let me know if this helps! — Lilly Shaw, Jan 19 '22 at 04:57

score 0 · Accepted Answer · answered Jan 19 '22 at 05:23

The tidyr::nest() does this directly. Notice for each combination of grouping/nesting variables, a tibble is neatly tucked into the data cell. I've modified your function a little by (a) removing the aspects unrelated to grouping (like filter) and (b) making groups default to an empty character vector so if nothing is passed then nothing is grouped.

Also, the names (e.g., male honors) are easily retrievable via variable values. That's typically a lot more useful than retrieving the values from the variable names.

Will this work for your purposes?

subgroups <- function(df, groups = character(0)) {
  df |> 
    tidyr::nest(data = -groups)
}

> subgroups(ds, c("class", "sex"))
# # A tibble: 4 × 3
#   sex    class  data            
#   <chr>  <chr>  <list>          
# 1 Male   Honors <tibble [2 × 2]>
# 2 Male   Core   <tibble [1 × 2]>
# 3 Female Core   <tibble [1 × 2]>
# 4 Female Honors <tibble [1 × 2]>

> subgroups(ds, c("sex"))
# # A tibble: 2 × 2
#   sex    data            
#   <chr>  <list>          
# 1 Male   <tibble [3 × 3]>
# 2 Female <tibble [2 × 3]>

> subgroups(ds)
# # A tibble: 1 × 1
#   data            
#   <list>          
# 1 <tibble [5 × 4]>

Additional resources: tidyr's Nested data vignette

SEAnalyst · Answer 2 · 2022-01-19T07:07:31.110

You can create a column key that is used to filter. The unique of the key could be used to loop through each subset of your data frame. Here is a solution with your data as df and the desired list result as l.

library(dplyr)
#make a key (constructed of 2 or more column values)
df<- df  |>  mutate(key = paste0(sex, "_", class))
#get the unqiue keys
keys<-unique(df$key)
#make an empty list
l<-list()
#loop through unique keys to filter your df, removing the key column 
for(x in 1:length(keys)){
  l[[x]]<-df[df$key ==keys[x],]  |> select(!key)
}
#name list elements
names(l)<-tolower(keys)
# your desired result
l

And written as a function, it would look like this:

subgroups <- function(df, groups = character(0)){
#make a key vector 
v <- df  |>  select(groups) 
v <- do.call(paste, c(v, sep = "_"))
#get unqiue keys
keys<-unique(v)
#make an empty list
l<-list()
#loop through unique keys to filter, removing the key column 
for(x in 1:length(keys)){
  l[[x]]<-df[v %in% keys[x],] |> select(!key)
}
return(l)
}

#example call
subgroups(df, c("sex", "class"))

How to write R function to create every subgroup based on multiple columns?

2 Answers2