0

I have the following time-series dataset sample here:

ymd      rf
19820103  3
19820104  9
19820118  4
19820119  2
19820122  0
19820218  5

Now the dataset is supposed to be organized in a daily time-series manner. More specifically, ymd is supposed to range continuously from 19820101 through 19820230. However, as you can see from the sample above, the dataset is not continuous and does not contain days such as "19820101" and "19820102", etc. For these dates where the dataset is unavailable, I'd like to be able to include the missing days and enter a "0" value for the rf.

What would be the best way to make a script to automate this problem? I'll have to do this from 1979 through 2016 daily time-series datasets.

ArunK
  • 1,731
  • 16
  • 35
Don
  • 170
  • 1
  • 11

2 Answers2

1

Let's assume your data is in a data frame named "mydata". Then you could do the following:

#Create full ymd with all the needed dates
ymd.full <- data.frame(ymd=seq(min(mydata$ymd), max(mydata$ymd)))

#Merge both datasets
mydata <- merge(ymd.full, mydata, all.x=T)

#Replace NAs with 0
mydata[is.na(mydata)] <- 0
Gaurav Bansal
  • 5,221
  • 14
  • 45
  • 91
  • Awesome thanks. How would I modify this if I have another scenario where it includes three additional columns and I need to interpolate or average the previous/after time series values for only two of the three columns? – Don Jul 18 '16 at 14:21
  • The first two lines of code would stay the same even if you have more columns in our data. If you want to interpolate to fill NAs, the third line would change to something like this post: http://stackoverflow.com/questions/7188807/interpolate-na-values – Gaurav Bansal Jul 18 '16 at 14:35
  • Awesome thanks for the link! I'll take a look at it and try to work with it. Really appreciate the time you took to link that to me mate! – Don Jul 18 '16 at 15:27
0

This solution is similar to @Gaurav Bansal's, but uses dplyr:

ymd.full <- data.frame(ymd=seq(min(mydata$ymd), max(mydata$ymd))
newdata  <- dplyr::left_join(ymd.full, mydata)
newdata[is.na(newdata)] <- 0

I'm wondering, though, how the ymd translates to a date, and since I suppose you want to do time series analysis, whether leap days are accounted for in your set.

sebastianmm
  • 1,148
  • 1
  • 8
  • 26
  • Thanks. Yes ymd translates to a date and it does account for leap days. The date format is from a raw json output of an API. – Don Jul 18 '16 at 14:19
  • Good, then you just need to convert your data.frame to a timeseries. Have a look at [this thread](http://stackoverflow.com/questions/8437620/analyzing-daily-weekly-data-using-ts-in-r) for info on packages for daily data. – sebastianmm Jul 18 '16 at 14:23