Lately I often had to handle time series data from multiple .csv sources in the same analysis. Let's assume for simplicity that all series are regular quarterly series (no missing values in between). Typically the original .csv data contains a date column plus 1-3 variables. Unfortunately the series are not of equal length across .csv files.
I started to organize my dataset in R and ended up with a big mess containing lots of window()
commands. Plus I had to concatenate NAs and original series before turning them into ts()
objects because I found concatenating (multivariate) ts()
objects so counter-intuitive.
Note that the reason why I added NAs is that I wanted to all series to be of the same length. Of course I could have trimmed the longer ones, but then I´d eventually loose observations when not using shorter series.
I thought about writing a function that reads .csv files and uses it's date column to create ts()
objects and maybe with another function merge all the single series to create a multivariate series containing NAs when data is missing. I found myself switching data types all the time, reading through the ts and zoo manuals – i just could not believe it was that complex.
I really thought this problem is really common and thought about the preparations in excel.. I mean I really hate excel, but this time I wonder what more experienced useRs do? R or Excel?
EDIT: added some exemplary data (need to aggregate daily data) file1:
27.05.11;5965.95
26.05.11;5947.06
25.05.11;5942.82
24.05.11;5939.98
file2 (without date col, but i know start and frequency)
Germany;Switzerland;USA;OECDEurope
69,90974;61,8241;55,60966;64,96157
67,0394;62,18966;56,47361;64,15152
70,56651;63,6347;56,87237;65,43568
file3:
1984-04-01,33.3238396624473
1984-07-01,63.579833082501
1984-10-01,35.8375401560349
I admit exemplary data does help to illustrate the question, but it`s rather a best practice type of question adressing more experienced users than myself. How do you prepare your data for multivariate ts analysis ?