0

I have a dataframe like this:

    Col1    Col2    
10   1        6         
11   3        8        
12   9        4        
13   7        2
14   4        3
15   2        9
16   6        7
17   8        1
18   5        5

I want to use KFold cross validation to fit my model and make predictions.

for train_index, test_index in kf.split(X_train, y_train):

    model.fit(X[train_index], y[train_index])
    y_pred = model.predict(X[test_index])

This code generate the following error :

'[1 2 4 7] not in index'

I saw that after a KFold.split(), train_index and test_index do not use the real index number of the dataframe.

So i can not fit my model.

Anyone have an idea ?

Clement Ros
  • 329
  • 1
  • 3
  • 17
  • I already test this and that did not solve my errors. As i said my error is on the fit part. and when i tried to use `.loc` indexes used is not the real indexes of my dataframe. As indexes do not exist in my dataframe it fill values with `NaN` – Clement Ros Dec 11 '18 at 09:21
  • OK, reopened. No idea. – jezrael Dec 11 '18 at 09:23

1 Answers1

6

From what I see, the index of your dataframe starts at 10 and not at 0, and as you said the split from sklearn uses index starting from 0. One solution is to reset the index of your dataframe with :

df = df.reset_index(drop=True)

Another solution, is to use .iloc on your dataframe, so it would look like (Assuming y is an array, if it's a dataframe, you'll have to use .iloc there too).

for train_index, test_index in kf.split(X_train, y_train):
   model.fit(X.iloc[train_index], y[train_index])
   y_pred = model.predict(X.iloc[test_index])

A third solution is to convert your dataframe to an array.

for train_index, test_index in kf.split(X_train, y_train):
   model.fit(X.values[train_index], y[train_index])
   y_pred = model.predict(X.values[test_index])

Edit : I can even see a 4-th solution, which might be the one you would want. You can just do df.index.values[train_index] to get the array of index in the train set.

Statistic Dean
  • 4,861
  • 7
  • 22
  • 46