I'm working on this type of data:
Date Of Stop count
0 2012-01-01 180
1 2013-01-01 348
2 2014-02-01 537
3 2015-02-01 498
4 2016-03-01 719
5 2017-03-01 406
And trying to make a prediction for the dates I don't have data (count)
This is my code where I divide dates to first 11 months and the 12th month Then I'm trying based on the first 11 Month define what can I get on 12th month
dfhalf = groupbyClass[(groupbyClass['Date Of Stop'] > '01/01/2012') &
(groupbyClass['Date Of Stop'] < '12/01/2012')]
dfpred = groupbyClass[(groupbyClass['Date Of Stop'] >= '12/01/2012') &
(groupbyClass['Date Of Stop'] < '01/01/2013')]
from sklearn.linear_model import LinearRegression
X = dfhalf['Date Of Stop'] # put dates in here
y = dfhalf['count'] # put knowh in here
model = LinearRegression()
model.fit(X, y)
X_predict = dfpred['Date Of Stop'] # dates for prediction
y_predict = model.predict(X_predict)
This unfortunately throws at me something like this:
ValueError: Expected 2D array, got 1D array instead:
array=['2012-01-02T00:00:00.000000000' '2012-01-03T00:00:00.000000000'
'2012-01-04T00:00:00.000000000' '2012-01-05T00:00:00.000000000'
'2012-01-06T00:00:00.000000000' '2012-01-07T00:00:00.000000000'
'2012-01-08T00:00:00.000000000' '2012-01-09T00:00:00.000000000'
'2012-01-10T00:00:00.000000000' '2012-01-11T00:00:00.000000000'
....
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I tried to use different forms of .reshape(-1, 1 and (1, -1) when I define my X or y. But no luck. I don't understand what I need to do and why.