0

I'm getting familiar with the stl function in the stats packcage by trying it on an example from Brockwell & Davis's 2002 "Introduction to Times Series and Forcasting". Specifically, I'm using a subset of his red wine sales data. It's a detour from the stl material at http://www.stat.pitt.edu/stoffer/tsa3/R_toot.htm (at some point, I have to stop simply following and try to make it work with new data).

I need a minimum of 36 wine sales data points in the series, since stl otherwise complains about the data being less than 2 cycles. The data is in ~/tmp/wine.txt:

464
675
703
887
1139
1077
1318
1260
1120
963
996
960
530
883
894
1045
1199
1287
1565
1577
1076
918
1008
1063
544
635
804
980
1018
1064
1404
1286
1104
999
996
1015

My sourced test code is buried in a repeat loop so that I can use a break command to circumvent the final error-causing statement that I'm trying to figure out:

repeat{

    # Clear variables (from stackexchange)
    rm( list=setdiff( ls( all.names=TRUE ), lsf.str(all.names=TRUE ) ) )
    ls()

    head( wine <- read.table("~/tmp/wine.txt") )
    ( x <- ts(wine[[1]],frequency=12) )
    ( y <- ts(wine,frequency=12) )
    ( a=stl(x,"per") )
    #break
    ( b=stl(y,"per") )
}

The final statement causes the error 'Error in stl(y, "per") : only univariate series are allowed'. I found an explanation at Time series and stl in R: Error only univariate series are allowed. That's how I came up with the assignment to x using wine[[1]]. I found an explanation to the need for double square brackets at http://www.r-tutor.com/r-introduction/list/named-list-members.

My problem is that it's not very clear what is happening inside the ts structures x and y. If I simply print them, they look 100% identical:

> x
   Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
1  464  675  703  887 1139 1077 1318 1260 1120  963  996  960
2  530  883  894 1045 1199 1287 1565 1577 1076  918 1008 1063
3  544  635  804  980 1018 1064 1404 1286 1104  999  996 1015
> y
   Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
1  464  675  703  887 1139 1077 1318 1260 1120  963  996  960
2  530  883  894 1045 1199 1287 1565 1577 1076  918 1008 1063
3  544  635  804  980 1018 1064 1404 1286 1104  999  996 1015

Whatever their differences, it's not causing R to misinterpret the data; that is, they each look like in single series of numerical data.

Can anyone illuminate the difference in the data inside the ts data structures? The potential incompatibility with stl is just one symptom. Right now, the "solution" is black magic to me, and I would like to get a clearer picture so that I know when else (and how) to watch out for this.

I've posted this to the R Help mailing list http://thread.gmane.org/gmane.comp.lang.r.general/319626 and to stackoverflow at How numerical data is stored inside ts time series objects.

Community
  • 1
  • 1
user36800
  • 2,019
  • 2
  • 19
  • 34
  • 1
    Use 'attributes()'. Time series objects are numeric with attributes. – IRTFM Apr 21 '15 at 00:01
  • Thanks, BondedDust. This complements the use of the str() function recommended in the mailing list. – user36800 Apr 22 '15 at 02:02
  • 1
    I read the mailing list dialog. You got answers from two of the most knowledgeable respondents. Two more comments: R has multiple versions of "time series" objects. You can see all of an R object with 'dput'. – IRTFM Apr 22 '15 at 16:33
  • Yes, and following up on their comments have been very educational. Thanks for the pointer to dput. As for the multiple versions of TS, I appreciate the heads up. The mind boggling array of choices shown at the CRAN Task View for Time Series Analysis certainly adds to the challenge of ramping up in both R and TS. I'm using the ASTSA package and the native stats package capabilities for now. However, I found an excellent source of info for the beginner at http://www.statmethods.net/advstats/timeseries.html. – user36800 Apr 23 '15 at 02:47

0 Answers0