I have multiple sets of time series data and would like help figuring out the best way to get them into R and analyze them with R. I'm pretty familiar with data.table but not so familiar with R's ts
class supporting time series analysis.
In particular, I want to known how to use ts
in this situation or if there are limitations in ts
(such as problems aggregating a set of ts
objects) that make it in appropriate to use here.
The Data
There are a large number of stores. For each store, I have multiple data points for each day, such as sales volume in dollars, sales volume in number of transactions, and store traffic (number of people entering the store). (Actually what I have is a table with columns store ID, date, and the data for that store and date.)
What I've been doing is using a data.table with one row per store, aggregating the data by store into months and storing the values for each month in a separate named column (e.g. jan14_dollars, feb14_dollars...) but this is unwieldy for a lot of reasons, in particular when I want to look at weeks or quarters.
I was thinking the right way to handle this was to have columns of type ts
so each row would be just be store, dollars_ts, transactions_ts, traffic_ts
but (a) how do I get the data into that format and (b) can ts
be combined the way integers can to give me the results I want? If you can only answer (a) or (b) but not both, please do answer what you can.
I cannot provide a realistic data set, but you can generate a random one to play with like this:
require("data.table")
storeData <- CJ(store = toupper(letters), date = seq(as.Date('2012-01-01'), as.Date('2014-01-01'), by="day"))
storeData$dollars = sample(100:100000, nrow(storeData), replace = TRUE)/100
storeData$transactions <- sample(0:1000, nrow(storeData), replace = TRUE)
storeData$traffic <- storeData$transactions + sample(0:1000, nrow(storeData), replace = TRUE)
head(storeData)
store date dollars transactions traffic
1: A 2012-01-01 48.60 409 990
2: A 2012-01-02 996.89 36 428
3: A 2012-01-03 69.35 647 1103
4: A 2012-01-04 334.56 953 973
5: A 2012-01-05 692.99 958 1753
6: A 2012-01-06 973.32 724 1086
The Analysis
I want to answer questions like "how many stores had positive dollar sales growth?" and "is there a relationship between change in dollars/transaction and change in traffic?" and to bin the data into time periods and compare the answers across time periods (e.g. Q1 this year versus Q1 last year).
Can these kinds of questions be answered using ts
? If so, how do I get this data into an appropriate set of columns or is there some structure other than data.table
I should be using?
Please show both how to organize the data and then how to use the data to answer the example questions "how many stores had positive dollar sales growth in January 2014 compared to January 2013?" and "what is the overall trend in dollars per transaction for the past 3 months?"