1

In my data https://pastebin.com/CernhBCg I have irregular timestamps and a corresponding value. Additionally to the irregularity I have large gaps, for which I have no value in my data. I know however that for those gaps value is zero and I would like to fill up to gaps with rows with value=0. How can I do this?

Data

> dput(head(hub2_select,10))
structure(list(time = structure(c(1492033212.648, 1492033212.659, 
1492033212.68, 1492033212.691, 1492033212.702, 1492033212.724, 
1492033212.735, 1492033212.757, 1492033212.768, 1492033212.779
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), value = c(3, 
28, 246, 297, 704, 798, 1439, 1606, 1583, 1572)), .Names = c("time", 
"value"), row.names = c(NA, 10L), class = "data.frame")

Please take the file I provided to see the data and read it into R with

library(readr)
df <- read_csv("data.csv", col_types = list(time = col_datetime(), value = col_double()))

Solutions

For one the the values left and right of a gap are usually 0 or 1. So that might help. I thought I'd use a rolling join, but from I understand by now, this seems not be the way to go.

What works is

library(dplyr)
library(lubridate)
threshold_time = dseconds(2)
time_prev = df$time[1]
addrows = data.frame()
for (i in seq(2, nrow(df),1)){
  time_current <- df$time[i]
  if ((time_current - time_prev) > threshold_time){
    time_add <- seq(time_prev, time_current, dseconds(0.1))
    addrows = bind_rows(addrows, data.frame(time=time_add, value=rep(0, length(time_add))))
  }
  time_prev <- time_current
}

addrows$type <- 'filled'
df$type <- 'orig'
df_new <- bind_rows(df, addrows)

library(ggplot2)
ggplot(df_new, aes(time,value,color=type)) + geom_point()

But this solution is neither elegant nor efficient (I did not test efficiency though).

Make42
  • 12,236
  • 24
  • 79
  • 155
  • 1
    Please use `dput(head(DF,10))` to post subset of your dataset instead of external links see [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more details. – Silence Dogood Apr 18 '17 at 08:54
  • @hNu: Done. That does not show the gaps though. – Make42 Apr 18 '17 at 08:58
  • 3
    then share a dataset that illustrates your problem... – mtoto Apr 18 '17 at 09:09
  • Perhaps `?tidyr::complete`. – Axeman Apr 18 '17 at 09:10
  • Use this to subset your dataset `dput(DF[index1:index2,])` where `index1` and `index2` are row starting positions that contain the gaps – Silence Dogood Apr 18 '17 at 09:11
  • @mtoto: What? That is what I did, didn't I? You cannot directly upload files to SO: https://meta.stackexchange.com/a/47690 – Make42 Apr 18 '17 at 09:12
  • 1
    Your `dput` does not show the problem. Use a section of your data that reproduces the problem you have. However, I think you [want this](http://stackoverflow.com/questions/16787038/r-insert-rows-for-missing-dates-times) – Sotos Apr 18 '17 at 09:17
  • The problem is only revealed with a little more data. That I provided as a link. Please use the commands I provided in the post now to load the data. Besides that I tried to describe the issue so that, for understanding it, no data is required. – Make42 Apr 18 '17 at 09:19
  • I don't get it. Should something like `ind <- seq(min(df$time), max(df$time), by = 'sec'); d5 <- data.frame(time = ind); merge(df, d5, by = 'time', all = TRUE)` work? – Sotos Apr 18 '17 at 09:46
  • @Sotos: No, I don't think this is what I am looking for. I added a brute force solution in my question. From there you can see what kind of result I am expecting. – Make42 Apr 18 '17 at 09:47
  • What is `dseconds(2)`? – Sotos Apr 18 '17 at 09:59
  • @Sotos: It is from package lubridate – Make42 Apr 18 '17 at 10:15
  • 1
    I am not sure what you are after, but I think the `padr` package can be a good help here. Please provide a complete data set capturing the problem with `dput` and the desired solution for that part of data. – Edwin Apr 18 '17 at 10:49

1 Answers1

0

Honestly I haven't tried it yet (I had to switch to Python for other reasons and solved it there and didn't get around to try it out), but I am pretty sure https://cran.r-project.org/web/packages/padr/vignettes/padr.html would have been the answer. I just wanted to write this here for other readers with the same question.

Make42
  • 12,236
  • 24
  • 79
  • 155