0

Does anyone know the explanation and the solution for this error as a result of trying to create train/split datasets using a Scikit-Learn method (train_test_split):

# list of feature names
feature_cols = ['date_time', 'bow', 'steel', 'swing', 'nail', 'peg']

# a subset of the original DataFrame
X = data[feature_cols]

# select a Series from the DataFrame
y = 'marks'

# split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=1)

Error:

ValueError: Found arrays with inconsistent numbers of samples: [      3 7126674]
user27976
  • 903
  • 3
  • 17
  • 28
  • possible [duplicate](http://stackoverflow.com/questions/30813044/sklearn-found-arrays-with-inconsistent-numbers-of-samples-when-calling-linearre) – MaxU - stand with Ukraine May 02 '16 at 20:36
  • I saw that post too and tried the solution offered but didn't work for me. I got errors regarding the reshape function and was not sure in which package the reshape function is in. Any suggestions? – user27976 May 02 '16 at 21:05
  • 2
    `y='mark'` doesn't seem to select a `pd.Series`, should perhaps be `y=data.loc[:, 'marks']`? – Stefan May 02 '16 at 21:06
  • 1
    @StefanJansen you don't need to use `loc`, you can just index as `data['marks']` – maxymoo May 03 '16 at 03:34
  • Thanks to the two of you, Stefan Jansen and maxymoo. Both approaches worked for me. I really appreciate your time. – user27976 May 03 '16 at 13:53

0 Answers0