0

I'm trying to perform a model selection between KNN and Logistic Regression using sampling technique as 10-fold cross validation, but keep getting the above error after the last part. Could I please be advised on what I'm doing wrong? Thanks.

Here is my code:

import pandas as pd
import numpy as np

import sklearn.model_selection
from sklearn.model_selection import cross_val_score

#Load data
mtcars_df = pd.read_csv('mtcars.csv')

mtcars_df.head()

X = mtcars_df.iloc[:,1:].values
Y = mtcars_df['model'].values

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=5)
print(cross_val_score(knn,X,Y, cv=10, scoring='accuracy').mean())
Veysel Olgun
  • 552
  • 1
  • 3
  • 15
daniness
  • 363
  • 1
  • 4
  • 21
  • You can look at this answer: https://stackoverflow.com/questions/48313387/valueerror-n-splits-10-cannot-be-greater-than-the-number-of-members-in-each-cla – Veysel Olgun Oct 14 '22 at 18:54
  • @VeyselOlgun thanks, but I did come across the answer you're suggesting, but that didn't help. I've tried: from sklearn.model_selection import StratifiedKFold skf = StratifiedKFold(n_splits=10) print(cross_val_score(skf,X,Y,scoring='accuracy').mean()) , but am now getting an "TypeError: estimator should be an estimator implementing 'fit' method, StratifiedKFold(n_splits=10, random_state=None, shuffle=False) was passed" error. – daniness Oct 14 '22 at 19:29
  • 1
    @daniness Do you have a class with fewer than 10 samples? – joanis Oct 14 '22 at 19:30
  • @joanis I don't believe so, b/c X and Y are both 32 in length, when I did len(Y) and len(X). – daniness Oct 14 '22 at 19:32
  • 1
    What I mean is, is there a value of Y for which you have fewer that 10 instances? If X and Y are just 32 in length and you have more than 3 classes, then you cannot possibly have ten examples of each class. (A class is a value of Y here.) – joanis Oct 14 '22 at 19:33
  • By the way, 32 samples is an extremely small data set. You're not likely to be able to do much with that. – joanis Oct 14 '22 at 19:35
  • @joanis yes, you're right. Here's how the data looks: array([['Mazda RX4', 21.0, 6, 160.0, 110, 3.9, 2.62, 16.46, 0, 1, 4, 4], ['Mazda RX4 Wag', 21.0, 6, 160.0, 110, 3.9, 2.875, 17.02, 0, 1, 4, 4], ['Datsun 710', 22.8, 4, 108.0, 93, 3.85, 2.32, 18.61, 1, 1, 4, 1], ['Hornet 4 Drive', 21.4, 6, 258.0, 110, 3.08, 3.215, 19.44, 1, 0, 3, 1]...so yes, there appears to be one instance of each value of Y. – daniness Oct 14 '22 at 19:41

0 Answers0