I am trying to reproduce the behavior of the R's createDataPartition function in python. I have a dataset for machine learning with the boolean target variable. I would like to split my dataset in a training set (60%) and a testing set (40%).
If I do it totally random, my target variable won't be properly distributed between the two sets.
I achieve it in R using:
inTrain <- createDataPartition(y=data$repeater, p=0.6, list=F)
training <- data[inTrain,]
testing <- data[-inTrain,]
How can I do the same in Python?
PS : I am using scikit-learn as my machine learning lib and python pandas.