sklearn Kfold acces single fold instead of for loop

Question

After using cross_validation.KFold(n, n_folds=folds) I would like to access the indexes for training and testing of single fold, instead of going through all the folds.

So let's take the example code:

from sklearn import cross_validation
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])
kf = cross_validation.KFold(4, n_folds=2)

>>> print(kf)  
sklearn.cross_validation.KFold(n=4, n_folds=2, shuffle=False,
                           random_state=None)
>>> for train_index, test_index in kf:

I would like to access the first fold in kf like this (instead of for loop):

train_index, test_index in kf[0]

This should return just the first fold, but instead I get the error: "TypeError: 'KFold' object does not support indexing"

What I want as output:

>>> train_index, test_index in kf[0]
>>> print("TRAIN:", train_index, "TEST:", test_index)
TRAIN: [2 3] TEST: [0 1]

Link: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.KFold.html

Question

How do I retrieve the indexes for train and test for only a single fold, without going through the whole for loop?

mbatchkarov · Accepted Answer · 2018-11-22T16:43:53.100

26

You are on the right track. All you need to do now is:

kf = cross_validation.KFold(4, n_folds=2)
mylist = list(kf)
train, test = mylist[0]

kf is actually a generator, which doesn't compute the train-test split until it is needed. This improves memory usage, as you are not storing items you don't need. Making a list of the KFold object forces it to make all values available.

Here are two great SO question that explain what generators are: one and two

Edit Nov 2018

The API has changed since sklearn 0.20. An updated example (for py3.6):

from sklearn.model_selection import KFold
import numpy as np

kf = KFold(n_splits=4)

X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])


X_train, X_test = next(kf.split(X))

In [12]: X_train
Out[12]: array([2, 3])

In [13]: X_test
Out[13]: array([0, 1])

edited Nov 22 '18 at 16:43

answered Dec 09 '14 at 14:14

mbatchkarov

15,487
9
60
79

This indeed does the trick, thanks :) but I guess the last line of code should be: train, test = l[0]? – NumesSanguis Dec 09 '14 at 14:22
1

Good answer, but you don't actually need to materialize all the folds: just `train, test = next(kf)` does the trick. – Fred Foo Dec 09 '14 at 21:12
4

using ```next(kf)``` returns "KFold object is not an iterator" – Juan Leni Aug 27 '15 at 13:30
@mbatchkarov It looks as though `sklearn` has undergone breaking changes. `KFold(4, n_folds=2)` now throws `TypeError: __init__() got an unexpected keyword argument 'n_folds'`. – Janosh Nov 22 '18 at 12:56
renamed to n_splits – Brndn Jul 26 '21 at 22:34

Jair Julio Condori Cotrina · Answer 2 · 2019-05-20T17:55:22.057

1

# We saved all the K Fold samples in different list  then we access to this throught [i]
from sklearn.model_selection import KFold
import numpy as np
import pandas as pd

kf = KFold(n_splits=4)

X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])

Y = np.array([0,0,0,1])
Y=Y.reshape(4,1)

X=pd.DataFrame(X)
Y=pd.DataFrame(Y)


X_train_base=[]
X_test_base=[]
Y_train_base=[]
Y_test_base=[]

for train_index, test_index in kf.split(X):

    X_train, X_test = X.iloc[train_index,:], X.iloc[test_index,:]
    Y_train, Y_test = Y.iloc[train_index,:], Y.iloc[test_index,:]
    X_train_base.append(X_train)
    X_test_base.append(X_test)
    Y_train_base.append(Y_train)
    Y_test_base.append(Y_test)

print(X_train_base[0])
print(Y_train_base[0])
print(X_train_base[1])
print(Y_train_base[1])

edited May 20 '19 at 17:55

answered May 20 '19 at 17:24

Jair Julio Condori Cotrina

11
2

While this code snippet may solve the problem, it doesn't explain why or how it answers the question. Please [include an explanation for your code](https://meta.stackexchange.com/q/114762/269535), as that really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. You can use the [edit] button to improve this answer to get more votes and reputation! – Brian Tompsett - 汤莱恩 May 20 '19 at 17:48

sklearn Kfold acces single fold instead of for loop

Question

2 Answers2

Linked