0

I'm following this thread to generate kfold indices for cross-validation using sklean's KFold.

from sklearn.model_selection import KFold
import numpy as np

X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
y = np.array([1, 2, 3, 4, 5])

When I use a for loop, everything works perfectly:

for train_index, test_index in kf.split(X):
    print("TRAIN:", train_index, "TEST:", test_index)

gives me:

TRAIN: [1 2 3 4] TEST: [0]
TRAIN: [0 2 3 4] TEST: [1]
TRAIN: [0 1 3 4] TEST: [2]
TRAIN: [0 1 2 4] TEST: [3]
TRAIN: [0 1 2 3] TEST: [4]

However, when I use next(), I always get the same index no matter how many times I run this:

train_idx, test_idx = next(kf.split(X))
print(train_idx, test_idx)

[1 2 3 4] [0]

Is there anything I'm missing? Thanks

desertnaut
  • 57,590
  • 26
  • 140
  • 166
George Liu
  • 3,601
  • 10
  • 43
  • 69

1 Answers1

1

As stated in comments, you need to call next() on what is being returned by split().

Code to try:

from sklearn.model_selection import KFold
import numpy as np

X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
y = np.array([1, 2, 3, 4, 5])

kf = KFold(n_splits=5)

randomIter = kf.split(X)
train_idx, test_idx = next(randomIter)
print(train_idx, test_idx)
train_idx, test_idx = next(randomIter)
print(train_idx, test_idx)
train_idx, test_idx = next(randomIter)
print(train_idx, test_idx)
train_idx, test_idx = next(randomIter)
print(train_idx, test_idx)
Drees
  • 688
  • 1
  • 6
  • 21