0

So im trying to get data about the stocks, the close price and moving averages 50, 100, 200. I got an another array which then is the label which is buy or sell. It was worked out on a dataframe along with all the other arrays. But the problem is when i try to train the classifer, it gives me an error:

ValueError: Found array with dim 3. Estimator expected <= 2.

When I concatenate the array, it gives me an error, ValueError: Unknown label type: array([[7.87401353,]]) with more values in it This is my code:

from sklearn import tree
import pandas as pd
import pandas_datareader.data as web
import numpy as np

df = web.DataReader('goog', 'yahoo', start='2012-5-1', end='2016-5-20')

close_price = df[['Close']]

ma_50 = (pd.rolling_mean(close_price, window=50))
ma_100 = (pd.rolling_mean(close_price, window=100))
ma_200 = (pd.rolling_mean(close_price, window=200))

#adding buys and sell based on the values
df['B/S']= (df['Close'].diff() < 0).astype(int)
close_buy = df[['Close']+['B/S']]
closing = df[['Close']].as_matrix()
buy_sell = df[['B/S']]


close_buy = pd.DataFrame.dropna(close_buy, 0, 'any')
ma_50 = pd.DataFrame.dropna(ma_50, 0, 'any')
ma_100 = pd.DataFrame.dropna(ma_100, 0, 'any')
ma_200 = pd.DataFrame.dropna(ma_200, 0, 'any')

close_buy = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_50 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_100 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
ma_200 = (df.loc['2013-02-15':'2016-05-21']).as_matrix()
buy_sell = (df.loc['2013-02-15':'2016-05-21']).as_matrix()  # Fixed

list(close_buy)

clf = tree.DecisionTreeClassifier()
X = list([close_buy,ma_50,ma_100,ma_200]) 
y = [buy_sell]  
sam202252012
  • 165
  • 5
  • 16
  • Possible duplicate of [Sklearn Error, array with 4 dim. Estimator <=2](http://stackoverflow.com/questions/37361116/sklearn-error-array-with-4-dim-estimator-2) – piRSquared May 22 '16 at 06:13

1 Answers1

1

The problem is that you are creating a variable X that is a list of 2-d arrays. That automatically implies a 3rd dimension.

# offending line
X = list([close_buy,ma_50,ma_100,ma_200])

This needs to be concatenated to maintain 2 dimensions.

# corrected
X = np.concatenate([close_buy,ma_50,ma_100,ma_200], axis=1)

Also, I suspect that once this problem is fixed you'll have another with:

 y = [buy_sell]

There is no reason to wrap this in []. This will cause the same 3 dimensional problem. Just put this:

y = buy_sell
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • I get this error Traceback (most recent call last): File "C:/Users/Samuel/PycharmProjects/untitled3/777.py", line 41, in clf.fit(x,y) File "C:\Users\Samuel\Anaconda3\lib\site-packages\sklearn\tree\tree.py", line 177, in fit check_classification_targets(y) File "C:\Users\Samuel\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py", line 173, in check_classification_targets raise ValueError("Unknown label type: %r" % y) ValueError: Unknown label type: array([[ 7.87401353e+02, 7.93261381e+02, 7.87071324e+02,] Process finished with exit code 1 – sam202252012 May 22 '16 at 13:28
  • things worked out, apparently i need a Regressor not a classifer – sam202252012 May 22 '16 at 14:02