It seems like we both had similar issues here. Unfortunately, imbalanced-learn isn't always what you need and scikit does not offer the functionality you want. You will want to implement your own code.
This is what I came up for my application. Note that I have not had extensive time to debug it but I believe it works from the testing I have done. Hope it helps:
def equal_sampler(classes, data, target, test_frac):
# Find the least frequent class and its fraction of the total
_, count = np.unique(target, return_counts=True)
fraction_of_total = min(count) / len(target)
# split further into train and test
train_frac = (1-test_frac)*fraction_of_total
test_frac = test_frac*fraction_of_total
# initialize index arrays and find length of train and test
train=[]
train_len = int(train_frac * data.shape[0])
test=[]
test_len = int(test_frac* data.shape[0])
# add values to train, drop them from the index and proceed to add to test
for i in classes:
indeces = list(target[target ==i].index.copy())
train_temp = np.random.choice(indeces, train_len, replace=False)
for val in train_temp:
train.append(val)
indeces.remove(val)
test_temp = np.random.choice(indeces, test_len, replace=False)
for val in test_temp:
test.append(val)
# X_train, y_train, X_test, y_test
return data.loc[train], target[train], data.loc[test], target[test]
For the input, classes expects a list of possible values, data expects the dataframe columns used for prediction, target expects the target column.
Take care that the algorithm may not be extremely efficient, due to the triple for-loop(list.remove takes linear time). Despite that, it should be reasonably fast.