I'm getting familiar with the stl function in the stats packcage by trying it on an example from Brockwell & Davis's 2002 "Introduction to Times Series and Forcasting". Specifically, I'm using a subset of his red wine sales data. It's a detour from the stl material at http://www.stat.pitt.edu/stoffer/tsa3/R_toot.htm (at some point, I have to stop simply following and try to make it work with new data).
I need a minimum of 36 wine sales data points in the series, since stl otherwise complains about the data being less than 2 cycles. The data is in ~/tmp/wine.txt:
464
675
703
887
1139
1077
1318
1260
1120
963
996
960
530
883
894
1045
1199
1287
1565
1577
1076
918
1008
1063
544
635
804
980
1018
1064
1404
1286
1104
999
996
1015
My sourced test code is buried in a repeat loop so that I can use a break command to circumvent the final error-causing statement that I'm trying to figure out:
repeat{
# Clear variables (from stackexchange)
rm( list=setdiff( ls( all.names=TRUE ), lsf.str(all.names=TRUE ) ) )
ls()
head( wine <- read.table("~/tmp/wine.txt") )
( x <- ts(wine[[1]],frequency=12) )
( y <- ts(wine,frequency=12) )
( a=stl(x,"per") )
#break
( b=stl(y,"per") )
}
The final statement causes the error 'Error in stl(y, "per") : only univariate series are allowed'. I found an explanation at Time series and stl in R: Error only univariate series are allowed. That's how I came up with the assignment to x using wine[[1]]. I found an explanation to the need for double square brackets at http://www.r-tutor.com/r-introduction/list/named-list-members.
My problem is that it's not very clear what is happening inside the ts structures x and y. If I simply print them, they look 100% identical:
> x
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 464 675 703 887 1139 1077 1318 1260 1120 963 996 960
2 530 883 894 1045 1199 1287 1565 1577 1076 918 1008 1063
3 544 635 804 980 1018 1064 1404 1286 1104 999 996 1015
> y
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 464 675 703 887 1139 1077 1318 1260 1120 963 996 960
2 530 883 894 1045 1199 1287 1565 1577 1076 918 1008 1063
3 544 635 804 980 1018 1064 1404 1286 1104 999 996 1015
Whatever their differences, it's not causing R to misinterpret the data; that is, they each look like in single series of numerical data.
Can anyone illuminate the difference in the data inside the ts data structures? The potential incompatibility with stl is just one symptom. Right now, the "solution" is black magic to me, and I would like to get a clearer picture so that I know when else (and how) to watch out for this.
I've posted this to the R Help mailing list http://thread.gmane.org/gmane.comp.lang.r.general/319626 and to stackoverflow at How numerical data is stored inside ts time series objects.