-1

I have some time series data for prices that I'm trying to perform linear regression on. However, I feel that what I'm doing is incorrect and was hoping someone could point me in the right direction.

My data looks like this:

date             Close
2017-05-10      0.12512
2017-05-11      0.12353
2017-05-12     -0.35235
.
.
.
2019-01-10      0.87890

Close refers to the closing price of each day and is scaled to be within (-1, 1).

I've attempted to use linear regression from the sklearn.linear_model.LinearRegression library. When I initially ran it to fit the data, date is a string type and so the program alerted me that it cannot work with string data. So I simply dropped the date column and just worked with the Close values in the training and test set.

My intuition tells me that this is the wrong approach. According to this answer there is something called Polyfit in NumPy. Is it impossible to use the standard Scikit-Learn Linear Regression with the data that I have?

Sean
  • 2,890
  • 8
  • 36
  • 78

1 Answers1

2

LinearRegression is not the solution for TimeSerie

In the context of Statistics, linear regression is solved by maximizing the likelihood that the error of a model linear in basis is the mean of a Normal Distribution. During maximization we assume the observations are independently and identically distributed, clearly not a reasonable assumption for times series data.

For TimeSerie problem, there are many solutions depend on what is your problem. http://www.statsoft.com/textbook/time-series-analysis

if you want to predict the next value, i suggest RNN basically