next() always gives the same index with a KFold generator

Question

I'm following this thread to generate kfold indices for cross-validation using sklean's KFold.

from sklearn.model_selection import KFold
import numpy as np

X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
y = np.array([1, 2, 3, 4, 5])

When I use a for loop, everything works perfectly:

for train_index, test_index in kf.split(X):
    print("TRAIN:", train_index, "TEST:", test_index)

gives me:

TRAIN: [1 2 3 4] TEST: [0]
TRAIN: [0 2 3 4] TEST: [1]
TRAIN: [0 1 3 4] TEST: [2]
TRAIN: [0 1 2 4] TEST: [3]
TRAIN: [0 1 2 3] TEST: [4]

However, when I use next(), I always get the same index no matter how many times I run this:

train_idx, test_idx = next(kf.split(X))
print(train_idx, test_idx)

[1 2 3 4] [0]

Is there anything I'm missing? Thanks

because you keep calling `.split` and *then* `next`. you need to keep calling `next` on what is returned by `.split`. — juanpa.arrivillaga, Aug 07 '19 at 16:24

Drees · Accepted Answer · 2019-08-07T16:37:34.873

As stated in comments, you need to call next() on what is being returned by split().

Code to try:

from sklearn.model_selection import KFold
import numpy as np

X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
y = np.array([1, 2, 3, 4, 5])

kf = KFold(n_splits=5)

randomIter = kf.split(X)
train_idx, test_idx = next(randomIter)
print(train_idx, test_idx)
train_idx, test_idx = next(randomIter)
print(train_idx, test_idx)
train_idx, test_idx = next(randomIter)
print(train_idx, test_idx)
train_idx, test_idx = next(randomIter)
print(train_idx, test_idx)

next() always gives the same index with a KFold generator

1 Answers1