4

I have a dataframe with 118 observations and has three columns in total. Now, I would like to split this dataframe into two dataframes with 59 observations each. I tried two different approaches but none of them returned what I wanted.

How can I do this using dplyr in R?

Sample dataframe:

Code Count_2020 Count_2021
A    1          2
B    2          4
C    3          6
D    4          8
E    5          10
F    6          12

Desired output:

DF1

Code Count_2020 Count_2021
    A    1          2
    B    2          4
    C    3          6

DF2

Code Count_2020 Count_2021
    D    4          8
    E    5          10
    F    6          12

1st Approach

Based on this answer

library(tidyverse)
df= df %>% group_split(Code)

Now this returns a list of 118, so it's a list of 118 observations where each list has 3 columns.

2nd Approach

Based on this answer

library(tidyverse)
df= df %>% sample_n(size = 59) %>% 
  split(f = as.factor(.$Code))

Now this returns a list of 59.

Ed_Gravy
  • 1,841
  • 2
  • 11
  • 34
  • What's the logic you're trying to use? You have 118 different codes, so when you split by that, you get 118 data frames...what did you expect instead? Your code only knows the logic you give it, so if you want to lump codes together, or you want to split to have a certain number of rows or a certain ratio of the total rows in each, you have to put that in your code – camille Aug 05 '21 at 23:06
  • I see, that's why I switched to the second approach – Ed_Gravy Aug 05 '21 at 23:08
  • So are you trying to find a way to do this without hard-coding the number of rows per group? – camille Aug 06 '21 at 00:01

3 Answers3

7

We may use gl to create the grouping column in group_split

library(dplyr)
df1 %>%
      group_split(grp = as.integer(gl(n(), 59, n())), .keep = FALSE)
akrun
  • 874,273
  • 37
  • 540
  • 662
3

We could use slice

library(dplyr)

DF1 <- DF %>% 
    slice(1:3)

DF2 <- DF %>% 
    slice(4:6)

Output:

> DF1
  Code Count_2020 Count_2021
1    A          1          2
2    B          2          4
3    C          3          6
> DF2
  Code Count_2020 Count_2021
1    D          4          8
2    E          5         10
3    F          6         12
TarJae
  • 72,363
  • 6
  • 19
  • 66
1

Here is an option using split -

n <- 3
split(df, ceiling(seq(nrow(df))/n))

#$`1`
#  Code Count_2020 Count_2021
#1    A          1          2
#2    B          2          4
#3    C          3          6

#$`2`
#  Code Count_2020 Count_2021
#4    D          4          8
#5    E          5         10
#6    F          6         12
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213