0

enter image description here

My time series is on a minute base. Some minutes are missing (see from lines 30 to 31 there are 6minutes missing. How can I fill in those minutes with count 0? So I want to add every missing minute and fill in the count with 0.

EvA
  • 65
  • 5

2 Answers2

1

If the column is Datetime class, create a sequence from the min and max value of 'timestamp' column by 1 minute in complete while simultaneously specify the count as 0 for those elements that would be missing in the original dataset

library(tidyr)
library(dplyr)
df2 <- complete(df1, timestamp = seq(min(timestamp), 
        max(timestamp), by = "1 min"), fill = list(count = 0))

If we need the lane column to be filled by the same value, use fill

df2 <- complete(df1, timestamp = seq(min(timestamp), 
        max(timestamp), by = "1 min"), fill = list(count = 0)) %>%
    fill(lane)

NB: If the column 'timestamp' is not Datetime class, can convert to POSIXct with as.POSIXct

df1$timestamp <- as.POSIXct(df1$timestamp) 

before doing the complete step

akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thankyou. I already tried that, but it doesnt do anything, i also dont get a error so i dont know what is wrong. My timestamp is class and my code is: complete(Data_2019, timestamp=seq((min(timestamp)), max(timestamp), by = "1 min"), fill=list(count=0)) %>% fill(lane) – EvA Dec 27 '20 at 21:31
  • @EvA have you assigned it back to the same object or different object i.e. `df2 <- complete(df1, ...` – akrun Dec 27 '20 at 21:34
  • 1
    I assigned it to a different object now. It works :), thankyou!! – EvA Dec 27 '20 at 21:38
1

A simple way is to use the fill_gaps() function from the tsibble package:

library(tsibble)
library(dplyr)

df1 <- tibble(
    row = 1:100,
    lane = rnorm(100),
    count = sample(1:5, size=100, replace=TRUE),
    timestamp = seq(as.POSIXct("2019-11-02 00:00:00"), by= "1 min", length=100)
  ) %>%
  filter(row <= 30 | row >= 36) %>%
  select(-row)
df1[26:35,]
#> # A tibble: 10 x 3
#>      lane count timestamp          
#>     <dbl> <int> <dttm>             
#>  1  0.218     4 2019-11-02 00:25:00
#>  2 -1.63      4 2019-11-02 00:26:00
#>  3  0.603     5 2019-11-02 00:27:00
#>  4 -1.04      4 2019-11-02 00:28:00
#>  5 -0.397     5 2019-11-02 00:29:00
#>  6  0.179     5 2019-11-02 00:35:00
#>  7  0.391     4 2019-11-02 00:36:00
#>  8  1.09      5 2019-11-02 00:37:00
#>  9  0.119     2 2019-11-02 00:38:00
#> 10  0.949     3 2019-11-02 00:39:00

df2 <- df1 %>%
  as_tsibble(index=timestamp) %>%
  fill_gaps(count=0)

df2[26:35,]
#> # A tsibble: 10 x 3 [1m] <?>
#>      lane count timestamp          
#>     <dbl> <dbl> <dttm>             
#>  1  0.218     4 2019-11-02 00:25:00
#>  2 -1.63      4 2019-11-02 00:26:00
#>  3  0.603     5 2019-11-02 00:27:00
#>  4 -1.04      4 2019-11-02 00:28:00
#>  5 -0.397     5 2019-11-02 00:29:00
#>  6 NA         0 2019-11-02 00:30:00
#>  7 NA         0 2019-11-02 00:31:00
#>  8 NA         0 2019-11-02 00:32:00
#>  9 NA         0 2019-11-02 00:33:00
#> 10 NA         0 2019-11-02 00:34:00

Created on 2020-12-28 by the reprex package (v0.3.0)

Rob Hyndman
  • 30,301
  • 7
  • 73
  • 85