Split a dataframe into smaller dataframes in R using dplyr

Question

I have a dataframe with 118 observations and has three columns in total. Now, I would like to split this dataframe into two dataframes with 59 observations each. I tried two different approaches but none of them returned what I wanted.

How can I do this using dplyr in R?

Sample dataframe:

Code Count_2020 Count_2021
A    1          2
B    2          4
C    3          6
D    4          8
E    5          10
F    6          12

Desired output:

DF1

Code Count_2020 Count_2021
    A    1          2
    B    2          4
    C    3          6

DF2

Code Count_2020 Count_2021
    D    4          8
    E    5          10
    F    6          12

1st Approach

Based on this answer

library(tidyverse)
df= df %>% group_split(Code)

Now this returns a list of 118, so it's a list of 118 observations where each list has 3 columns.

2nd Approach

Based on this answer

library(tidyverse)
df= df %>% sample_n(size = 59) %>% 
  split(f = as.factor(.$Code))

Now this returns a list of 59.

What's the logic you're trying to use? You have 118 different codes, so when you split by that, you get 118 data frames...what did you expect instead? Your code only knows the logic you give it, so if you want to lump codes together, or you want to split to have a certain number of rows or a certain ratio of the total rows in each, you have to put that in your code — camille, Aug 05 '21 at 23:06
So are you trying to find a way to do this without hard-coding the number of rows per group? — camille, Aug 06 '21 at 00:01

score 7 · Accepted Answer · answered Aug 05 '21 at 23:07

7

We may use gl to create the grouping column in group_split

library(dplyr)
df1 %>%
      group_split(grp = as.integer(gl(n(), 59, n())), .keep = FALSE)

answered Aug 05 '21 at 23:07

akrun

874,273
37
540
662

score 3 · Answer 2 · answered Aug 05 '21 at 23:20

We could use slice

library(dplyr)

DF1 <- DF %>% 
    slice(1:3)

DF2 <- DF %>% 
    slice(4:6)

Output:

> DF1
  Code Count_2020 Count_2021
1    A          1          2
2    B          2          4
3    C          3          6
> DF2
  Code Count_2020 Count_2021
1    D          4          8
2    E          5         10
3    F          6         12

score 1 · Answer 3 · answered Aug 06 '21 at 02:54

Here is an option using split -

n <- 3
split(df, ceiling(seq(nrow(df))/n))

#$`1`
#  Code Count_2020 Count_2021
#1    A          1          2
#2    B          2          4
#3    C          3          6

#$`2`
#  Code Count_2020 Count_2021
#4    D          4          8
#5    E          5         10
#6    F          6         12

Split a dataframe into smaller dataframes in R using dplyr

3 Answers3