0

Trying to use 10 fold TimeSeriesSplit(), but in the documentation of cross_val_score, it is given that we need to pass a cross-validation generator or an iterable.

tss = TimeSeriesSplit(max_train_size=None, n_splits=10)
l =[]
neighb = [1,3,5,7,9,11,13,12,23,19,18]
for k in neighb:
    knn = KNeighborsClassifier(n_neighbors=k, algorithm='brute')
    sc = cross_val_score(knn, X1, y1, cv=tss, scoring='accuracy')
    l.append(sc.mean())

How should I pass it after the time-series split into train and test data to cv?

TypeError                   
 Traceback (most recent call last)
<ipython-input-44-acf06bc7340e> in <module>()
     14 for k in neighb:
     15     knn = KNeighborsClassifier(n_neighbors=k, algorithm='brute')
---> 16     sc = cross_val_score(knn, X1, y1, cv=tss, scoring='accuracy')
     17     l.append(sc.mean())
     18 ~\Anaconda3\lib\site-packages\sklearn\cross_validation.py in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
   1579                                               train, test, verbose, None,
   1580                                               fit_params)
-> 1581                       for train, test in cv)
   1582     return np.array(scores)[:, 0]
   1583 
TypeError: 'TimeSeriesSplit' object is not iterable
Mario
  • 1,631
  • 2
  • 21
  • 51
  • Trying to use 10 fold TimeSeries Split, but in the documentation of cross_val_score, it is given that we need to pass a cross-validation generator or an iterable. How should I pass it after time series split into train and test data to cv – Dhruv Bhardwaj May 10 '18 at 07:55
  • I have updated the answer, see the Edit2. Thats why I was asking for full code. – Vivek Kumar May 11 '18 at 10:55
  • It works after removing the deprecated class. – Dhruv Bhardwaj May 11 '18 at 11:18
  • @VivekKumar would you check this related [post](https://stackoverflow.com/q/75818230/10452700)? – Mario Mar 23 '23 at 08:48

1 Answers1

2

Just pass tss to cv.

scores = cross_val_score(knn, X_train, y_train, cv=tss , scoring='accuracy')

No need to call tss.split().

Update: The above method is tested on scikit-learn v0.19.1 . So make sure you have the latest version. Also I am using TimeSeriesSplit from model_selection module.

Edit 1:

You are using this now:

tss = TimeSeriesSplit(n_splits=10).split(X_1)
kn = KNeighborsClassifier(n_neighbors=5, algorithm='brute') 
sc = cross_val_score(kn, X1, y1, cv=tss, scoring='accuracy') 

But in the question you posted you did this:

tss = TimeSeriesSplit(n_splits=10)

See the difference between them (split() is not present). I am using this tss in the cross_val_score() without the split() as you posted in the question.

Edit 2:

Dude you are using the deprecated class. Currently you are doing this:

from sklearn.cross_validation import cross_val_score

This is wrong. You should get a warning like this:

DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.

Pay attention to that, and use the model_selection module like this:

from sklearn.model_selection import cross_val_score

Then you will not get error with my code.

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132
  • Ok, I'll try using this. – Dhruv Bhardwaj May 11 '18 at 08:34
  • This will not work as TimeSeriesSplit is not iterable – Dhruv Bhardwaj May 11 '18 at 09:03
  • @DhruvBhardwaj. I have checked this code before posting and its working as intented. Please share more details of your current error. Also please read the updated answer – Vivek Kumar May 11 '18 at 09:14
  • tss = TimeSeriesSplit(n_splits=10).split(X_1) kn = KNeighborsClassifier(n_neighbors=5, algorithm='brute') sc = cross_val_score(kn, X1, y1, cv=tss, scoring='accuracy') – Dhruv Bhardwaj May 11 '18 at 09:22
  • I have the latest version of scikit-learn installed as you mentioned – Dhruv Bhardwaj May 11 '18 at 09:28
  • @DhruvBhardwaj You are doing it wrong and different from the code you gave in question. I have updated the answer to correct your code. – Vivek Kumar May 11 '18 at 09:36
  • Oh, I was just trying to make my code look cleaner. So, how to resolve this too many indices error as far as I know the split() function will return the indices of the train and test data after Time-based Splitting – Dhruv Bhardwaj May 11 '18 at 09:43
  • @DhruvBhardwaj Dont use the `split()` function. Just send the `TimeseriesSplit()` without calling `split` to `cross_val_score`, as I am doing. `cross_val_score` will do that automatically – Vivek Kumar May 11 '18 at 09:54
  • TypeError: 'TimeSeriesSplit' object is not iterable @Vivek Kumar – Dhruv Bhardwaj May 11 '18 at 10:11
  • How can you iterate the time series object just like that As far as I Know something is wrong – Dhruv Bhardwaj May 11 '18 at 10:12
  • @DhruvBhardwaj As I said, cross_val_score will iterate by calling the `split()` method inside automatically, no need for you to do it. Just try my code and if still not clear, please edit the question with the exact code and data. – Vivek Kumar May 11 '18 at 10:19
  • @DhruvBhardwaj Is it not working? What error are you getting? – Vivek Kumar May 11 '18 at 10:32
  • @DhruvBhardwaj Post the full stack trace of error.I am not getting any error on your code with my data – Vivek Kumar May 11 '18 at 10:44
  • @VivekKumar Would you consider this related [question](https://stackoverflow.com/questions/75818230/what-is-the-best-practice-to-apply-cross-validation-using-timeseriessplit-over)? Thanks in advance. – Mario Mar 29 '23 at 02:55