15

After using cross_validation.KFold(n, n_folds=folds) I would like to access the indexes for training and testing of single fold, instead of going through all the folds.

So let's take the example code:

from sklearn import cross_validation
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])
kf = cross_validation.KFold(4, n_folds=2)

>>> print(kf)  
sklearn.cross_validation.KFold(n=4, n_folds=2, shuffle=False,
                           random_state=None)
>>> for train_index, test_index in kf:

I would like to access the first fold in kf like this (instead of for loop):

train_index, test_index in kf[0]

This should return just the first fold, but instead I get the error: "TypeError: 'KFold' object does not support indexing"

What I want as output:

>>> train_index, test_index in kf[0]
>>> print("TRAIN:", train_index, "TEST:", test_index)
TRAIN: [2 3] TEST: [0 1]

Link: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.KFold.html

Question

How do I retrieve the indexes for train and test for only a single fold, without going through the whole for loop?

Community
  • 1
  • 1
NumesSanguis
  • 5,832
  • 6
  • 41
  • 76

2 Answers2

26

You are on the right track. All you need to do now is:

kf = cross_validation.KFold(4, n_folds=2)
mylist = list(kf)
train, test = mylist[0]

kf is actually a generator, which doesn't compute the train-test split until it is needed. This improves memory usage, as you are not storing items you don't need. Making a list of the KFold object forces it to make all values available.

Here are two great SO question that explain what generators are: one and two


Edit Nov 2018

The API has changed since sklearn 0.20. An updated example (for py3.6):

from sklearn.model_selection import KFold
import numpy as np

kf = KFold(n_splits=4)

X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])


X_train, X_test = next(kf.split(X))

In [12]: X_train
Out[12]: array([2, 3])

In [13]: X_test
Out[13]: array([0, 1])
mbatchkarov
  • 15,487
  • 9
  • 60
  • 79
1
# We saved all the K Fold samples in different list  then we access to this throught [i]
from sklearn.model_selection import KFold
import numpy as np
import pandas as pd

kf = KFold(n_splits=4)

X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])

Y = np.array([0,0,0,1])
Y=Y.reshape(4,1)

X=pd.DataFrame(X)
Y=pd.DataFrame(Y)


X_train_base=[]
X_test_base=[]
Y_train_base=[]
Y_test_base=[]

for train_index, test_index in kf.split(X):

    X_train, X_test = X.iloc[train_index,:], X.iloc[test_index,:]
    Y_train, Y_test = Y.iloc[train_index,:], Y.iloc[test_index,:]
    X_train_base.append(X_train)
    X_test_base.append(X_test)
    Y_train_base.append(Y_train)
    Y_test_base.append(Y_test)

print(X_train_base[0])
print(Y_train_base[0])
print(X_train_base[1])
print(Y_train_base[1])
  • While this code snippet may solve the problem, it doesn't explain why or how it answers the question. Please [include an explanation for your code](https://meta.stackexchange.com/q/114762/269535), as that really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. You can use the [edit] button to improve this answer to get more votes and reputation! – Brian Tompsett - 汤莱恩 May 20 '19 at 17:48