I have looked into Stratified sample in pandas, stratified sampling on ranges, among others and they don't assess my issue specifically, as I'm looking to split the data into 3 sets randomly.
I have an unbalanced dataframe of 10k rows, 10% is positive class, 90% negative class. I'm trying to figure out a way to split this dataframe into 3 datasets, as 60%, 20%, 20% of the dataframe considering the unbalance. However, this split has to be random and non-replaceable, which means if I put together the 3 datasets, it has to be equal to the original dataframe.
Usually I would use train_test_split()
but it only works if you are looking to split into two, not three datasets.
Any suggestions?
Reproducible example:
df = pd.DataFrame({"target" : np.random.choice([0,0,0,0,0,0,0,0,0,1], size=10000)}, index=range(0,10000,1))