8

I keep having an error running this part of my code:

scores = cross_val_score(XGB_Clf, X_resampled, y_resampled, cv=kf)

The error is :

DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)

I know there are lots of answers to this question, and that I need to use ravel(), but using it does not change anything!

Also, the array "y" I'm passing to the function is not a column-vector ...

See:

y_resampled
Out[82]: array([0, 0, 0, ..., 1, 1, 1], dtype=int64)

When I run

y_resampled.ravel()

I get

Out[81]: array([0, 0, 0, ..., 1, 1, 1], dtype=int64)

which is exactly the same as my initial variable...

Also, when I run y_resampled.values.ravel() I get an error telling me that this is well a numpy array...

Traceback (most recent call last): 
File "<ipython-input-80-9d28d21eeab5>", line 1, in <module>
y_resampled.values.ravel() 

AttributeError: 'numpy.ndarray' object has no attribute 'values'

Does any one of you have a solution to this?

Thanks a lot!

benmaq
  • 148
  • 1
  • 1
  • 7
  • After several days, I still cannot go over this issue.. I needed to ignore the warning directly in my code to get over it.. I really don't like this solution! Does anybody have an answer to this ? Thanks a lot... ! – benmaq Mar 14 '17 at 13:11
  • 1
    That is pretty weird. I have definitely seen some variance across the different model (and related) classes in scikit-learn with respect to these warnings. For example, I can pass a one-column'ed DataFrame into a LinearRegression object as y and there will be no warning. On the other hand, if I do this with an MLPRegressor I will get a warning, unless I pass only that column's values (a NumPy array). – abe May 12 '17 at 20:45
  • 1
    As for your specific issue: yes the ravel function won't change that array because ravel is meant to basically flatten or unroll a matrix into a 1d array. Also, regrading your attempted usage of the values attribute: that is what you'd expect too, as values returns the values of a DataFrame or Series as a NumPy array, and seeing as how you already have one, that won't work. I would ask what versions of sklearn and numpy you're running? Upgrading might not be a bad idea, pending any dependencies, of course. – abe May 12 '17 at 20:45

2 Answers2

1

Check out this answer man!

Simply:

model = forest.fit(train_fold, train_y.values.ravel())
Sevki Baba
  • 336
  • 1
  • 8
0

in you write y_resampled as dataframe, you can use values function.

import pandas as pd
y_resampled = pd.DataFrame(y_resampled)
ABC
  • 1