7

I have a large number of time series (>100) which differ in the sampling frequency and the time period for which they are available. Each time series has to be tested for unit roots and seasonally adjusted and other preliminary data transformations and checking etc.

As a large number of series have to be routinely checked, what is the solution to do it efficiently? The concern is to save time in the routine aspects and keep track of the series and analysis results. Unit root testing of the series for example is something subjective. How much of this type of analysis can be automated and how?

I have already read the questions regarding the statistical workflow which suggests having a common script to run on each series.

I am asking something more specific and based on experience of handling a multiple time series dataset. The focus is more on minimizing errors while dealing with so many series and also automating repetitive tasks.

Community
  • 1
  • 1
Anusha
  • 1,716
  • 2
  • 23
  • 27

1 Answers1

4

I assume the series will be examined independently, as you've not mentioned any inter-relationships in the models. I'm not sure what kind of object you're looking to use or which tests, but the basic goal of "best practices" is independent of the actual package to be used.

The simplest approaches involve loading objects into a list and analyzing each series via simple iterators such as lapply or via multicore methods such as mclapply or foreach, in R. For Matlab, you can operate over cell arrays. The parallel computing toolbox has a function called parfor, for "parallel for", which is similar to the foreach function in R. For my money, I'd recommend using R as it's cheaper (free) and has a much richer functionality for statistical analyses. Matlab has better documentation and help tools, but these tend to matter less over time as you become more familiar with the tools and methods of your research (and as your bookshelf of references grows).

It's good to become accustomed to using multicore tools in general, as this can substantially decrease the time it takes to do analyses on a bunch of independent small objects.

Iterator
  • 20,250
  • 12
  • 75
  • 111
  • Before analysis of the multiple series together, each of them have to be processed individually to know their characteristics e.g. whether I have to take log transform or test for unit root. I think that some of these tasks can be batch processed or automated as mentioned in some forecasting competitions. – Anusha Oct 19 '11 at 23:40
  • Great! Basic data validation is exactly the kind of modeling that is easily distributed for independent analyses via methods like parfor or foreach. It's nice to plow through a lot of simple stuff quickly & these help in that way. – Iterator Oct 19 '11 at 23:42
  • Can you elaborate on the multicore tools and parfor please? – Anusha Oct 19 '11 at 23:48
  • `foreach` and `parfor` are conceptually equivalent. The idea is that these extend the traditional for-loop by assuming that each loop iteration is independent and then running each iteration on whatever resource (i.e. cores of your computer) is available. For larger scale computing resources, such as grids, then one would look to other resources, but these should be adequate for 100 time series & basic validation. – Iterator Oct 19 '11 at 23:50
  • You might look to GPU computing as a cheaper, more efficient option. For MATLAB, look at Jacket by AccelerEyes (the GFOR loop especially), http://accelereyes.com – arrayfire Oct 26 '11 at 22:24
  • @melonakos These are good suggestions. I realize, from your profile, that you're with AccelerEyes. Do you know if integration with R is in the plans? – Iterator Oct 26 '11 at 22:52
  • @Iterator, yes, there is work being done to extend ArrayFire to R. (cool username btw!) – arrayfire Jan 27 '12 at 18:28