I have a dataset of time series data with a number of missing values. The data is of ozone concentrations recorded every hour over a year and the length of periods of missing data varies greatly. Because of the different cycles in the dataset (e.g. daily and seasonal), I want to use different data imputation methods based on the length of the period where data is missing. I am planning on using the zoo package for data imputation.
The breakdown of data imputation for periods of missing data:
- <=4 cells with missing data - use linear interpolation (na.approx(x))
- >4 to <=23 cells - use spline fit (na.spline(x))
- >23 cells - use seasonal Kalman filter (na.StructTS(x))
My guess is that I need to use conditional execution to dictate what cells are affected by commands, however I don’t know how to refer the values cells that come before and after and use them in an if statement.
I am fairly new to R so sorry if there is an obvious answer to this question or if this question has already been answered. I have done a search but couldn’t seem to find anything.
Date2006 Ozone2006
1 06-01-01 0:00 64
2 06-01-01 1:00 64
3 06-01-01 2:00 63
4 06-01-01 3:00 61
5 06-01-01 4:00 NA
6 06-01-01 5:00 52
7 06-01-01 6:00 60
8 06-01-01 7:00 59
9 06-01-01 8:00 47
This is what my dataset looks like. The ozone concentrations are integers.