27
  • What algorithms exist for time series forecasting/regression ?
    • What about using neural networks ? (best docs about this topic ?)
    • Are there python libraries/code snippets that can help ?
Nikana Reklawyks
  • 3,233
  • 3
  • 33
  • 49
gpilotino
  • 13,055
  • 9
  • 48
  • 61
  • 1
    It would be helpful if you could explain your application (what kind of time series you are working with) because the best method is a function of the madness. The answer to your algorithm "existence" question is "many". – Pete Sep 02 '10 at 18:18
  • i'm working with financial data (forex time series) – gpilotino Sep 05 '10 at 09:20
  • 1
    My favorite! The most important thing is to characterize the randomness in your time series first; if you find that it is random then any deterministic methodology can only work by luck. With markets you might find shades of non-random behavior here and there, and it will fade in and out. So success with deterministic methods depends greatly on your ability to adapt. – Pete Sep 05 '10 at 22:54

7 Answers7

73

The classical approaches to time series regression are:

  • auto-regressive models (there are whole literatures about them)

  • Gaussian Processes

  • Fourier decomposition or similar to extract the periodic components of the signal (i.e., hidden oscillations in the data)

Other less common approaches that I know about are

  • Slow Feature Analysis, an algorithm that extract the driving forces of a time series, e.g., the parameters behind a chaotic signal

  • Neural Network (NN) approaches, either using recurrent NNs (i.e., built to process time signals) or classical feed-forward NNs that receive as input part of the past data and try to predict a point in the future; the advantage of the latter is that recurrent NNs are known to have a problem with taking into account the distant past

In my opinion for financial data analysis it is important to obtain not only a best-guess extrapolation of the time series, but also a reliable confidence interval, as the resulting investment strategy could be very different depending on that. Probabilistic methods, like Gaussian Processes, give you that "for free", as they return a probability distribution over possible future values. With classical statistical methods you'll have to rely on bootstrapping techniques.

There are many Python libraries that offer statistical and Machine Learning tools, here are the ones I'm most familiar with:

  • NumPy and SciPy are a must for scientific programming in Python
  • There is a Python interface to R, called RPy
  • statsmodel contains classical statistical model techniques, including autoregressive models; it works well with Pandas, a popular data analysis package
  • scikits.learn, MDP, MLPy, Orange are collections of machine learning algorithms
  • PyMC A python module that implements Bayesian statistical models and fitting algorithms, including Markov chain Monte Carlo.
  • PyBrain contains (among other things) implementations of feed-forward and recurrent neural networks
  • at the Gaussian Process site there is a list of GP software, including two Python implementations
  • mloss is a directory of open source machine learning software
Glorfindel
  • 21,988
  • 13
  • 81
  • 109
pberkes
  • 5,141
  • 1
  • 24
  • 22
  • 1
    `pandas` is more active project: http://pandas.pydata.org/ – Taha Jahangir Sep 30 '12 at 15:42
  • Yes, `pandas` is a great project to manipulate data sequences, especially when dates are important. However, as far as I know it does not contain many algorithms for forecasting and regression beside basic statistical tools. See for example http://pandas.pydata.org/pandas-docs/dev/computation.html – pberkes Dec 12 '12 at 12:07
  • 1
    Thanks ! I was looking for ideas on how to build an generic internal model for a sensor ( i.e. IMU or a sonar for example ) operating in noisy environments and this gives good ideas in addition to traditional noise modeling. – kert Nov 29 '13 at 02:37
5

I've no idea about python libraries, but there are good forecasting algorithms in R which are open source. See the forecast package for code and references for time series forecasting.

Rob Hyndman
  • 30,301
  • 7
  • 73
  • 85
5

Two approaches

There are two ways on how to deal with temporal structured input for classification, regression, clustering, forecasting and related tasks:

  1. Dedicated Time Series Model: The machine learning algorithm incorporates such time series directly. Such a model is like a black box and it can be hard to explain the behavior of the model. Example are autoregressive models.
  2. Feature based approach: Here the time series are mapped to another, possibly lower dimensional, representation. This means that the feature extraction algorithm calculates characteristics such as the average or maximal value of the time series. The features are then passed as a feature matrix to a "normal" machine learning such as a neural network, random forest or support vector machine. This approach has the advantage of a better explainability of the results. Further it enables us to use a well developed theory of supervised machine learning.

tsfresh calculates a huge number of features

The python package tsfresh calculate a huge number of such features from a pandas.DataFrame containing the time series. You can find its documentation at http://tsfresh.readthedocs.io.

enter image description here

Disclaimer: I am one of the authors of tsfresh.

MaxBenChrist
  • 547
  • 3
  • 9
  • Guys, how to use this library if we have Regression/Multiclass labels? Your `test example` (https://github.com/blue-yonder/tsfresh/blob/master/notebooks/pipeline_example.ipynb) work fine. However, the `same example` but with `tiny changes` will not work et all - http://content.screencast.com/users/SASH2012/folders/Jing/media/3d5fb327-f5ed-4dba-9061-3093a492dd09/2016-12-23_1603.png Please advise. – SpanishBoy Dec 23 '16 at 14:04
  • You are using a random target. For a random target, no feature is relevant. Hence the algorithm will remove all features. – MaxBenChrist Dec 23 '16 at 14:24
  • I'd say some of features have very small value of stats test, but not relevant et all. How I can add my custom feature_selector based on Boruta instead? – SpanishBoy Dec 23 '16 at 16:11
  • in the screenshot you posted above, you set your target vector `y` to random variables. Every time series feature you can think of is worthless for a random target – MaxBenChrist Dec 23 '16 at 17:04
4

Speaking only about the algorithms behind them, I recently used the double exponential smoothing in a project and it did well by forecasting new values when there is a trend in the data.

The implementation is pretty trivial, but maybe the algorithm is not sufficiently elaborated for your case.

GaretJax
  • 7,462
  • 1
  • 38
  • 47
4

Did you tried Autocorrelation for finding periodical patterns in time series ? You can do that with numpy.correlate function.

Agnius Vasiliauskas
  • 10,935
  • 5
  • 50
  • 70
  • sounds interesting, do you have an example or a link with some snippets ? – gpilotino Sep 04 '10 at 14:38
  • I don't know if it helps, but you can try to check here- http://dr-adorio-adventures.blogspot.com/2010/04/computing-sample-partial.html Also check very good Python computer algebra system SAGE- http://www.sagemath.org/doc/reference/sage/finance/time_series.html – Agnius Vasiliauskas Sep 05 '10 at 20:06
2

Group method of data handling is widely used to forecast financial data.

1

If you want to understand Time Series Forecasting using Python then below link is very helpful.

https://github.com/ManojKumarMaruthi/Time-Series-Forecasting

Satya
  • 528
  • 1
  • 5
  • 9