1

I'm looking for built-in R functions to split a data.frame into a list of data.frames based on conditions on the column values.

To illustrate with an example, consider the data.frame below:

date         var_1       var_2        
date_1       a           b            
date_1       b           a            
date_2       c           b            
date_2       b           c            
date_2       a           b            
date_2       b           a            

The specific grouping conditions are:

var_1 %in% var_2 & var_2 %in% var_1 & date == date_x,

where date_x runs through the unique values of date. These conditions define the three groups:

date         var_1       var_2        
date_1       a           b            
date_1       b           a            
date         var_1       var_2        
date_2       c           b            
date_2       b           c            
date         var_1       var_2        
date_2       a           b            
date_2       b           a            
AfBM
  • 13
  • 3
  • Does this answer your question? [split dataframe in R by row](https://stackoverflow.com/questions/13125846/split-dataframe-in-r-by-row) – Clemsang Oct 11 '22 at 13:34
  • 1
    `split(data, ~date)` ? – moodymudskipper Oct 11 '22 at 13:37
  • I can't make sense of what you mean. If you had a row `date_1 a c` what would the groups look like? – SamR Oct 11 '22 at 13:38
  • You're right, splitting simply by date in the original example does the trick. My example was not constructed correctly, so I edited the values to not give the correct groups based only on date. – AfBM Oct 11 '22 at 13:41

1 Answers1

0

With dplyr, you can sort var_1 and var_2 rowwise, and split for equal date and sorted values.

library(dplyr)
df %>% 
  rowwise() %>% 
  mutate(sorted = list(sort(c(var_1, var_2)))) %>% 
  group_by(date, sorted) %>% 
  group_split() %>% 
  map(~ select(.x, -sorted))

output

[[1]]
# A tibble: 2 × 3
  date   var_1 var_2
  <chr>  <chr> <chr>
1 date_1 a     b    
2 date_1 b     a    

[[2]]
# A tibble: 2 × 3
  date   var_1 var_2
  <chr>  <chr> <chr>
1 date_2 a     b    
2 date_2 b     a    

[[3]]
# A tibble: 2 × 3
  date   var_1 var_2
  <chr>  <chr> <chr>
1 date_2 c     b    
2 date_2 b     c    
Maël
  • 45,206
  • 3
  • 29
  • 67