I have a data frame of 200k rows, 10 columns. ( only include 3 here for ease of reading)
df=data.frame(A=rep(letters[1:20],10000),B=rep(letters[2:21],10000),C=rep(letters[3:22],10000))
I am trying to split the data into two subsets - training and testing.
s=sample(dim(df[1],.6*dim(df)[1])
training=df[s,]
testing=df[-s,]
Is there a way to take a sample from df such that there is at least one factor in each of the resulting subsets? That is, from column A-J, I want at least one instance of each of the factors in both the training and testing sets.
I tried Random subset containing at least one instance of each factor but cannot apply it to multiple columns, as opposed to the single on used in the example.