2

I have a dataset of time series data with a number of missing values. The data is of ozone concentrations recorded every hour over a year and the length of periods of missing data varies greatly. Because of the different cycles in the dataset (e.g. daily and seasonal), I want to use different data imputation methods based on the length of the period where data is missing. I am planning on using the zoo package for data imputation.

The breakdown of data imputation for periods of missing data:

  1. <=4 cells with missing data - use linear interpolation (na.approx(x))
  2. >4 to <=23 cells - use spline fit (na.spline(x))
  3. >23 cells - use seasonal Kalman filter (na.StructTS(x))

My guess is that I need to use conditional execution to dictate what cells are affected by commands, however I don’t know how to refer the values cells that come before and after and use them in an if statement.

I am fairly new to R so sorry if there is an obvious answer to this question or if this question has already been answered. I have done a search but couldn’t seem to find anything.

    Date2006    Ozone2006
1   06-01-01 0:00   64
2   06-01-01 1:00   64
3   06-01-01 2:00   63
4   06-01-01 3:00   61
5   06-01-01 4:00   NA
6   06-01-01 5:00   52
7   06-01-01 6:00   60
8   06-01-01 7:00   59
9   06-01-01 8:00   47 

This is what my dataset looks like. The ozone concentrations are integers.

Tess
  • 21
  • 2

0 Answers0