I would like to proportionally split the data I have. For example, I have 100 rows and I want to randomly sample 1 row every two rows. Using tidymodels rsample I assumed I would do the below.
dat <- as_tibble(seq(1:100))
split <- inital_split(dat, prop = 0.5, breaks = 50)
testing <- testing(split)
When checking the data the split hasnt done what I thought it would. It seems close but not exactly. I thought the breaks call generates bins which are sampled from. So, breaks = 50
would split the the 100 rows into 50 bins, therefore having two rows per bin. I have also tried strata = value
to strafy accross the rows but I cannot get this to work either.
I am using this as an exaple but I am also curious how this would work when sampling 1 row every four etc.
Have I miss understood the breaks call function?