0

I have several time series of hourly that I am working with. Is there a way to add the date and missing values only to the beginning and end of the year the time series starts and ends in? So for the data posted I would like to fill the data to the beginning of 1990 and to the end of 2008. The only way I can see doing it is with an infinite number of loops. I have looked at dplyr, zoo, and seq for this task but cannot see how to only fill the year the data is taken in and in a concise manner. I would like to make a loop that will work on all of my different time series as changing the script for each timeseries. I am new to R so any assistance would be helpful!

My data:

date O3
9/15/1990 0:00 24
9/15/1990 1:00 28
9/15/1990 2:00 26
9/15/1990 3:00 25
9/15/1990 4:00 -999
9/15/1990 5:00 18
9/15/1990 6:00 17

The end of the data looks like this:

1/31/2008 19:00 -999
1/31/2008 20:00 -999
1/31/2008 21:00 -999
1/31/2008 22:00 -999
1/31/2008 23:00 -999

This is my current script:

library(openair)
library(plyr)
filedir <- "C:/Users/dfmcg/Documents/Thesisfiles/removedleapyears"
myfiles <- c(list.files(path = filedir))
paste(filedir, myfiles, sep = '/')
npsfiles <- c(paste(filedir, myfiles,sep = '/'))

for (i in npsfiles[1:28]) {

  timeozone <- import(i, date ="date", date.format = "%m/%d/%Y %H", header = TRUE, na.strings = "-999")

 ts <- seq.POSIXt(as.POSIXct("1990-01-01 0:00",'%Y-%m-%d %H'), as.POSIXct("2015-12-31 23:00",'%Y-%m-%d %H'), by="hour")

  ts <- seq.POSIXt(as.POSIXlt("1990-01-01 0:00:00"), as.POSIXlt("2015-12-31 0:00:00"), by="hour")
  ts <- format.POSIXct(ts,'%Y-%m-%D %H')

  df <- data.frame(date=ts)

  data_with_missing_times <- join(df,timeozone)
}
Dave2e
  • 22,192
  • 18
  • 42
  • 50
  • If you want a good answer it is very helpful to everyone if you provide a reproducible example. read this http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – CCurtis Sep 22 '16 at 18:57
  • Script added, thanks! – user6776063 Sep 22 '16 at 19:09
  • No one else but you will have the directory `filedir <- "C:/Users/dfmcg/Documents/Thesisfiles/removedleapyears"` so this isn't a reproducible example. Just posted an answer, No data given so you'll have to figure out how to adapt this to your code. It will fill in all missing data using a cupic spline interpolation. read about the functions to you can customize them to your specific needs. Get Rstudio if you don't have it. Makes life way easier. – CCurtis Sep 22 '16 at 19:16
  • I have posted data and my code. It should be reproducible with what is there. – user6776063 Sep 22 '16 at 19:24

1 Answers1

0

Use zoo. Replace -999 with NA. Then convert you data to a zoo object. Use na.spline i.e. yourdata$O3.zoo<-na.spline(yourdata$O3.zoo,method="fmm"). Just clip your data to the years you want after.

CCurtis
  • 1,770
  • 3
  • 15
  • 25
  • That just replaces values with interpolated values. It does not add missing values to fill the year. – user6776063 Sep 22 '16 at 19:17
  • you want to add missing dates? – CCurtis Sep 22 '16 at 19:20
  • Yeas, til the top and bottom of the year the data starts and ends in – user6776063 Sep 22 '16 at 19:22
  • Just use `seq.POSIXt()` using the first and last day of the year. Use the `by` argument and specify hour. Convert this object to a zoo object and use `merge` to merge it with the your data. – CCurtis Sep 22 '16 at 19:26
  • that is exactly what I have in my code currently. See above. However, when I merge it writes over the current data. How would I add to the top and bottom while keeping the data in between? – user6776063 Sep 22 '16 at 19:32
  • you're not using zoo objects and you're not using merge you use join. you don't need `plyr`. Convert you're data to zoo objects and use merge not join. Will work. – CCurtis Sep 22 '16 at 19:56
  • Okay, I know how to use merge but how would I convert to zoo? – user6776063 Sep 22 '16 at 20:14
  • Use can use the `zoo` function. Read documentation but basically. `zoo(yourdata,yourdatatimestamp)`. – CCurtis Sep 23 '16 at 17:31