-1

I'm trying since more than an hour to split randomly my data frame into two frame based on a given percentage, however, I can't make it work i don't know why.

I saw those posts :

What I want is basically to take as input a data frame df, and a real number α ∈ (0, 1) and returns a list consisting of two data frames df1 and df2. df1 is finally (a * 100)% of df, and df2 the rest of df, the unselected rows.

For example, if df has 100 rows, and α = 0.4, then df1 will consist of 40 randomly selected rows of df, and df2 will consist of the other 60 rows.

I could do it with a big function and loops etc, make my algorithm, but I'm pretty sure, another way to do it should exists and I would like to share this solution with the community !

Thank for your help !

Community
  • 1
  • 1
Emixam23
  • 3,854
  • 8
  • 50
  • 107
  • The top answer at http://stackoverflow.com/questions/17200114/how-to-split-data-into-training-testing-sets-using-sample-function-in-r-program seems to be exactly what you want to do and has a reproducible example there with the `mtcars` dataset. I don't understand how it falls short of what you need. – thelatemail Mar 13 '17 at 22:28
  • One of my friend just suggested me to clean the workspace (R studio) and now, it does work. It was throwing me an error... However, I'll put an answer with explanation in. Thank – Emixam23 Mar 13 '17 at 22:57
  • If it's simply just an issue with your workspace being messy, there's no need. We can just close this as a non-reproducible issue. – thelatemail Mar 13 '17 at 23:02
  • Yeah I'll delete the question, however, the answer isn't completed, how can I return 2 data frame at the same time? – Emixam23 Mar 13 '17 at 23:03
  • You use a `list` - `list(df1,df2)` – thelatemail Mar 13 '17 at 23:07
  • Interesting, thank you ! :) – Emixam23 Mar 13 '17 at 23:10

1 Answers1

0

Here is a function that splits the data into two data.frames using sample:

splitTable <- function(df, prob) {
  variant <- sample(seq(1, 0), size = nrow(df), replace = TRUE, prob = c(prob, 1 - prob))
  res <- split(df, variant) 
  return(res)
}

res <- splitTable(iris, 0.4)
Bulat
  • 6,869
  • 1
  • 29
  • 52