1

I'm working on this type of data:

  Date Of Stop count
0 2012-01-01   180
1 2013-01-01   348
2 2014-02-01   537
3 2015-02-01   498
4 2016-03-01   719
5 2017-03-01   406

And trying to make a prediction for the dates I don't have data (count)

This is my code where I divide dates to first 11 months and the 12th month Then I'm trying based on the first 11 Month define what can I get on 12th month

 dfhalf = groupbyClass[(groupbyClass['Date Of Stop'] > '01/01/2012') & 
         (groupbyClass['Date Of Stop'] < '12/01/2012')]
 dfpred = groupbyClass[(groupbyClass['Date Of Stop'] >= '12/01/2012') & 
         (groupbyClass['Date Of Stop'] < '01/01/2013')]

 from sklearn.linear_model import LinearRegression

 X = dfhalf['Date Of Stop']   # put dates in here
 y = dfhalf['count']          # put knowh in here

 model = LinearRegression()
 model.fit(X, y)

 X_predict = dfpred['Date Of Stop']  # dates for prediction
 y_predict = model.predict(X_predict)

This unfortunately throws at me something like this:

 ValueError: Expected 2D array, got 1D array instead:
 array=['2012-01-02T00:00:00.000000000' '2012-01-03T00:00:00.000000000'
 '2012-01-04T00:00:00.000000000' '2012-01-05T00:00:00.000000000'
 '2012-01-06T00:00:00.000000000' '2012-01-07T00:00:00.000000000'
 '2012-01-08T00:00:00.000000000' '2012-01-09T00:00:00.000000000'
 '2012-01-10T00:00:00.000000000' '2012-01-11T00:00:00.000000000'
 ....
 Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I tried to use different forms of .reshape(-1, 1 and (1, -1) when I define my X or y. But no luck. I don't understand what I need to do and why.

Shreyas
  • 999
  • 6
  • 19
dunkerkboy
  • 63
  • 1
  • 5
  • 1
    Besides the fact that linear regression needs 2D array input, you should not pass an array of string as input. No linear regression algo can work on strings. – Kevin Fang Oct 08 '18 at 23:16
  • Mihai Chelaru X = X.reshape(-1, 1) gives error: Cannot cast array data from dtype(' – dunkerkboy Oct 08 '18 at 23:22
  • Where did I pass array of strings? my first column is Date and second is float – dunkerkboy Oct 08 '18 at 23:24
  • Sorry, made a mistake, it should be `X = X.values[:,None]`. As to the date issue @KevinFang mentioned, take a look at [this post](https://stackoverflow.com/questions/29748717/use-scikit-learn-to-do-linear-regression-on-a-time-series-pandas-data-frame). – Mihai Chelaru Oct 08 '18 at 23:39
  • Still get Cannot cast array data from dtype(' – dunkerkboy Oct 08 '18 at 23:41
  • Here's another [similar question](https://stackoverflow.com/questions/48518471/working-with-date-types-in-python-linear-regression). Note the suggestions in the comments to extract some feature from the date, for instance the month since the start date based on your data, and use that instead for the regression. – Mihai Chelaru Oct 09 '18 at 00:04

0 Answers0