filling missing values datetime series data with interpolation in R

Question

I have the following dataset.

name   -  old - new - datetime     
1051     38656       38400      2016-01-24 03:22:37    
1051     5888        5632       2016-01-24 04:03:28  
1051     5632        38144      2016-01-24 04:34:22    
1051     5120        4864       2016-01-24 03:56:33  
1051     37376       37632      2016-01-25 08:08:16  
..       ..          ..         ..

I want to interpolate this dataset :

name   -  old - new -  datetime  
1051     ?           ?          2016-01-24 03:20:00  
1051     ?           ?          2016-01-24 03:30:00    
1051     ?           ?          2016-01-24 03:40:00  
1051     ?           ?          2016-01-24 03:50:00  
1051     ?           ?          2016-01-24 04:00:00  
1051     ?           ?          2016-01-25 04:10:00  
..       ..          ..         ..

I have complicated a dataset. So, I want to interpolate to get more clean dataset. I have try this :

data.frame(datetime = seq(roomsdatetime$datetime[1], roomsdatetime$datetime[nrow(roomsdatetime)], by = "10 min")) %>%  
    mutate(roomsdatetime, approx = na.approx(roomsdatetime$old_value))

I get this error:

Error: wrong result size(3562), expected 3565 or 1

Is there another way?

Is not clear what you are asking for. Do you want to transform your data to 10 min interval? If it is the case see examples here http://stackoverflow.com/a/10423613/2824732 — Robert, Aug 03 '16 at 11:47
Be careful for linear interpolation! "Transforming the data into equally spaced observations using linear INTERPOLATION can introduce a number of significant and hard to quantify biases". Ref: M. Scholes and J. Williams, “Estimating betas from nonsynchronous data”, Journal of Financial Economics 5: 309–327, 1977. http://www.sciencedirect.com/science/article/pii/0304405X77900411 Your aim is of course not to obtain equally spaced data, but anyway you try to employ interpolation. — Erdogan CEVHER, Aug 03 '16 at 11:52
Type of the interpolation is also missing in the question: linear, quadratic, cubic, etc.. Do `? smooth.spline` in console to analyze 3rd degree interpolations via `smooth.spline`. — Erdogan CEVHER, Aug 03 '16 at 12:15

Erdogan CEVHER · Answer 1 · 2016-08-03T21:31:04.603

In Excel, enter the following to the leftest-top:

[The "difference" (in seconds between (ordered) datetimes and the base 24.01.2016 03:20:00 whose value is assigned to 0) column was obtained via the formula "=(B3-$B$2)*86400"]

name           datetime difference old     new
1051    24.01.2016 03:20:00 0       NA     NA
1051    24.01.2016 03:22:37 157 38656   38400
1051    24.01.2016 03:30:00 600     NA     NA
1051    24.01.2016 03:40:00 1200    NA     NA
1051    24.01.2016 03:50:00 1800    NA     NA
1051    24.01.2016 03:56:33 2193    5120   4864
1051    24.01.2016 04:00:00 2400    NA     NA
1051    24.01.2016 04:03:28 2608    5888  5632
1051    24.01.2016 04:34:22 4462    5632  38144
1051    25.01.2016 04:10:00 89400   NA    NA
1051    25.01.2016 08:08:16 103696  37376  37632

Then, File - Save as - [FileName:seymaalaca.csv; Type: "CSV (comma separated) (*.csv)"]

mydataframe <- read.csv("C:/Users/User/Documents/Revolution/seymaalaca.csv", header=TRUE, sep=",", stringsAsFactors = FALSE)
mydataframe # results in:



    name            datetime difference   old   new    
1  1051 24.01.2016 03:20:00          0    NA    NA    
2  1051 24.01.2016 03:22:37        157 38656 38400    
3  1051 24.01.2016 03:30:00        600    NA    NA    
4  1051 24.01.2016 03:40:00       1200    NA    NA    
5  1051 24.01.2016 03:50:00       1800    NA    NA    
6  1051 24.01.2016 03:56:33       2193  5120  4864    
7  1051 24.01.2016 04:00:00       2400    NA    NA    
8  1051 24.01.2016 04:03:28       2608  5888  5632    
9  1051 24.01.2016 04:34:22       4462  5632 38144    
10 1051 25.01.2016 04:10:00      89400    NA    NA    
11 1051 25.01.2016 08:08:16     103696 37376 37632

oldcolumn <- lm(mydataframe$old ~ mydataframe$difference)
oldcolumn  #  old = 1.348e+04  + 2.233e-01*difference
oldfunction <- function (difference) {1.348e+04 + 2.233e-01*difference} # produces the row values for the "old" column

newcolumn <- lm(mydataframe$new ~ mydataframe$difference)
newcolumn  # new = 2.14e+04 + 1.56e-01*difference
newfunction <- function (difference) {2.14e+04 + 1.56e-01*difference} # produces the row values for the "new" column

myinterpolizer <- function (difference) {c(oldfunction(difference),newfunction(difference))} #  produces the row values for the "old&new" column

myinterpolizer(0)  # 13480 21400
myinterpolizer(600) # 13613.98 21493.60
myinterpolizer(1200) # 13747.96 21587.20
myinterpolizer(1800) # 13881.94 21680.80
myinterpolizer(2400) # 14015.92 21774.40
myinterpolizer(89400) # 33443.02 35346.40

A little simpler one-liner that produces the above 12 numbers:

# mydataframe[is.na(mydataframe$old),] # filters the rows where old=NA
# mydataframe[is.na(mydataframe$old),3] # After (filtering the rows where old=NA) select (the "difference" column) 
lapply(mydataframe[is.na(mydataframe$old),3], myinterpolizer)

Seyma Alaca should accept the solution. She seems not to know to use the "Accept" or "This answer is useful" button. — Erdogan CEVHER, Aug 05 '16 at 16:35

filling missing values datetime series data with interpolation in R

1 Answers1