2

I have the following dataset.

name   -  old - new - datetime     
1051     38656       38400      2016-01-24 03:22:37    
1051     5888        5632       2016-01-24 04:03:28  
1051     5632        38144      2016-01-24 04:34:22    
1051     5120        4864       2016-01-24 03:56:33  
1051     37376       37632      2016-01-25 08:08:16  
..       ..          ..         ..  

I want to interpolate this dataset :

name   -  old - new -  datetime  
1051     ?           ?          2016-01-24 03:20:00  
1051     ?           ?          2016-01-24 03:30:00    
1051     ?           ?          2016-01-24 03:40:00  
1051     ?           ?          2016-01-24 03:50:00  
1051     ?           ?          2016-01-24 04:00:00  
1051     ?           ?          2016-01-25 04:10:00  
..       ..          ..         ..

I have complicated a dataset. So, I want to interpolate to get more clean dataset. I have try this :

data.frame(datetime = seq(roomsdatetime$datetime[1], roomsdatetime$datetime[nrow(roomsdatetime)], by = "10 min")) %>%  
    mutate(roomsdatetime, approx = na.approx(roomsdatetime$old_value))

I get this error:

Error: wrong result size(3562), expected 3565 or 1

Is there another way?

Sotos
  • 51,121
  • 6
  • 32
  • 66
  • Is not clear what you are asking for. Do you want to transform your data to 10 min interval? If it is the case see examples here http://stackoverflow.com/a/10423613/2824732 – Robert Aug 03 '16 at 11:47
  • Be careful for linear interpolation! "Transforming the data into equally spaced observations using linear INTERPOLATION can introduce a number of significant and hard to quantify biases". Ref: M. Scholes and J. Williams, “Estimating betas from nonsynchronous data”, Journal of Financial Economics 5: 309–327, 1977. http://www.sciencedirect.com/science/article/pii/0304405X77900411 Your aim is of course not to obtain equally spaced data, but anyway you try to employ interpolation. – Erdogan CEVHER Aug 03 '16 at 11:52
  • Type of the interpolation is also missing in the question: linear, quadratic, cubic, etc.. Do `? smooth.spline` in console to analyze 3rd degree interpolations via `smooth.spline`. – Erdogan CEVHER Aug 03 '16 at 12:15

1 Answers1

0

In Excel, enter the following to the leftest-top:

[The "difference" (in seconds between (ordered) datetimes and the base 24.01.2016 03:20:00 whose value is assigned to 0) column was obtained via the formula "=(B3-$B$2)*86400"]

name           datetime difference old     new
1051    24.01.2016 03:20:00 0       NA     NA
1051    24.01.2016 03:22:37 157 38656   38400
1051    24.01.2016 03:30:00 600     NA     NA
1051    24.01.2016 03:40:00 1200    NA     NA
1051    24.01.2016 03:50:00 1800    NA     NA
1051    24.01.2016 03:56:33 2193    5120   4864
1051    24.01.2016 04:00:00 2400    NA     NA
1051    24.01.2016 04:03:28 2608    5888  5632
1051    24.01.2016 04:34:22 4462    5632  38144
1051    25.01.2016 04:10:00 89400   NA    NA
1051    25.01.2016 08:08:16 103696  37376  37632

Then, File - Save as - [FileName:seymaalaca.csv; Type: "CSV (comma separated) (*.csv)"]

mydataframe <- read.csv("C:/Users/User/Documents/Revolution/seymaalaca.csv", header=TRUE, sep=",", stringsAsFactors = FALSE)
mydataframe # results in:



    name            datetime difference   old   new    
1  1051 24.01.2016 03:20:00          0    NA    NA    
2  1051 24.01.2016 03:22:37        157 38656 38400    
3  1051 24.01.2016 03:30:00        600    NA    NA    
4  1051 24.01.2016 03:40:00       1200    NA    NA    
5  1051 24.01.2016 03:50:00       1800    NA    NA    
6  1051 24.01.2016 03:56:33       2193  5120  4864    
7  1051 24.01.2016 04:00:00       2400    NA    NA    
8  1051 24.01.2016 04:03:28       2608  5888  5632    
9  1051 24.01.2016 04:34:22       4462  5632 38144    
10 1051 25.01.2016 04:10:00      89400    NA    NA    
11 1051 25.01.2016 08:08:16     103696 37376 37632

oldcolumn <- lm(mydataframe$old ~ mydataframe$difference)
oldcolumn  #  old = 1.348e+04  + 2.233e-01*difference
oldfunction <- function (difference) {1.348e+04 + 2.233e-01*difference} # produces the row values for the "old" column

newcolumn <- lm(mydataframe$new ~ mydataframe$difference)
newcolumn  # new = 2.14e+04 + 1.56e-01*difference
newfunction <- function (difference) {2.14e+04 + 1.56e-01*difference} # produces the row values for the "new" column

myinterpolizer <- function (difference) {c(oldfunction(difference),newfunction(difference))} #  produces the row values for the "old&new" column

myinterpolizer(0)  # 13480 21400
myinterpolizer(600) # 13613.98 21493.60
myinterpolizer(1200) # 13747.96 21587.20
myinterpolizer(1800) # 13881.94 21680.80
myinterpolizer(2400) # 14015.92 21774.40
myinterpolizer(89400) # 33443.02 35346.40

A little simpler one-liner that produces the above 12 numbers:

# mydataframe[is.na(mydataframe$old),] # filters the rows where old=NA
# mydataframe[is.na(mydataframe$old),3] # After (filtering the rows where old=NA) select (the "difference" column) 
lapply(mydataframe[is.na(mydataframe$old),3], myinterpolizer) 
Erdogan CEVHER
  • 1,788
  • 1
  • 21
  • 40