Is there a way to split a pandas dataframe into multiple, mutually exclusive samples (of different length) stratified on a variable?
My current approach is to use train_test_split from sci-kit learn multiple times for each sample, but feels very inefficient.
cell_to_split, cell_1 = train_test_split(data, test_size=50, stratify=strat_variable)
cell_to_split, cell_2 = train_test_split(cell_to_split, test_size=60, stratify=strat_variable)
cell_to_split, cell_3 = train_test_split(cell_to_split, test_size=40, stratify=strat_variable)
# strat_variable here is a string variable in data or cell_to_split i'm using for random stratified sampling
This lets me get 3 samples from the dataset with specified size (number of rows) in each, balanced for representativeness on my strat_variable, but isn't too efficient, and I'd ideally like the number of samples (set as 3 here) to be dynamic.