Separate one data frame in two data frames with 70% and 30% of the original content

Question

I would like to separate one dataframe in two using R. As an example, having one dataframe 70% of the original content and the other one having 30%. How could I do that? My dataframe is of size (22740,2).

My dataframe consists in one column having genes and in the other column having the pathway where it belongs. I want to keep that 70-30 relation in EVERY pathway of the dataframe. Therefore, I am not interesting in taking the first 70% rows and do a new dataframe for example.

Hope I explained myself clearly.

score 1 · Answer 1 · answered Apr 12 '17 at 10:51

Using dplyr, df2 is the 70%, df3 is the 30% - ref is created to index the entries. The group_by ensures that each pathway is sampled individually.

library(dplyr)
df2 <- df %>% mutate(ref=seq_len(nrow(df))) %>% group_by(pathway) %>% sample_frac(0.7)
df3 <- df[-df2$ref,]

R18 · Answer 2 · 2017-04-12T13:15:34.537

0

If you want a random selection of the 30% of the samples, you can do:

   # Select a 30% of the samples
     Sel.ID <- sample(1:22740,size = .3*22740,replace=F)
   # The new table with the 30% of the samples would be . . .
     New.Tab.30 <- Tab[Sel.ID,]
   # The table with the 70% of the samples (the remaining) would be . . .
     New.Tab.70 <- Tab[-Sel.ID,]

You can run different times, getting different tables. If you want to keep the same, you should use set.seed(12345) for example before the first line.

edited Apr 12 '17 at 13:15

answered Apr 12 '17 at 10:51

R18

1,476
1
8
17

1

I think topic starter need `replace = FALSE` – Gregory Demin Apr 12 '17 at 11:08
1

@GregoryDemin You are right; if not, repeated values can be obtained. Edited in the message – R18 Apr 12 '17 at 13:16

Separate one data frame in two data frames with 70% and 30% of the original content

2 Answers2