I have a pandas dataframe df
containing data from 2 classes.
I would like to have randomly generated indices for a stratified K-fold cross-validation.
What I do at the moment is:
df_folds = np.array_split(df, 5)
for k in range(5):
# We use 'list' to copy, in order to 'pop' later on
df_train = list(df_folds)
df_test = df_train.pop(k)
df_train = pd.concat(df_train)
However, this is not a stratified 5-fold cross-validation as it just splits the dataframe in 5.
from sklearn.model_selection import StratifiedKFold
skf = StratifiedKFold(n_splits=3)
skf.get_n_splits(df)
print(skf)
for train_index, test_index in skf.split(df):
print("TRAIN:", train_index, "TEST:", test_index)
TypeError: split() takes at least 3 arguments (2 given)