My time series is on a minute base. Some minutes are missing (see from lines 30 to 31 there are 6minutes missing. How can I fill in those minutes with count 0? So I want to add every missing minute and fill in the count with 0.
Asked
Active
Viewed 138 times
2 Answers
1
If the column is Datetime
class, create a seq
uence from the min
and max
value of 'timestamp' column by 1 min
ute in complete
while simultaneously specify the count
as 0 for those elements that would be missing in the original dataset
library(tidyr)
library(dplyr)
df2 <- complete(df1, timestamp = seq(min(timestamp),
max(timestamp), by = "1 min"), fill = list(count = 0))
If we need the lane
column to be filled by the same value, use fill
df2 <- complete(df1, timestamp = seq(min(timestamp),
max(timestamp), by = "1 min"), fill = list(count = 0)) %>%
fill(lane)
NB: If the column 'timestamp' is not Datetime
class, can convert to POSIXct
with as.POSIXct
df1$timestamp <- as.POSIXct(df1$timestamp)
before doing the complete
step

akrun
- 874,273
- 37
- 540
- 662
-
Thankyou. I already tried that, but it doesnt do anything, i also dont get a error so i dont know what is wrong. My timestamp is class
and my code is: complete(Data_2019, timestamp=seq((min(timestamp)), max(timestamp), by = "1 min"), fill=list(count=0)) %>% fill(lane) – EvA Dec 27 '20 at 21:31 -
@EvA have you assigned it back to the same object or different object i.e. `df2 <- complete(df1, ...` – akrun Dec 27 '20 at 21:34
-
1I assigned it to a different object now. It works :), thankyou!! – EvA Dec 27 '20 at 21:38
1
A simple way is to use the fill_gaps()
function from the tsibble
package:
library(tsibble)
library(dplyr)
df1 <- tibble(
row = 1:100,
lane = rnorm(100),
count = sample(1:5, size=100, replace=TRUE),
timestamp = seq(as.POSIXct("2019-11-02 00:00:00"), by= "1 min", length=100)
) %>%
filter(row <= 30 | row >= 36) %>%
select(-row)
df1[26:35,]
#> # A tibble: 10 x 3
#> lane count timestamp
#> <dbl> <int> <dttm>
#> 1 0.218 4 2019-11-02 00:25:00
#> 2 -1.63 4 2019-11-02 00:26:00
#> 3 0.603 5 2019-11-02 00:27:00
#> 4 -1.04 4 2019-11-02 00:28:00
#> 5 -0.397 5 2019-11-02 00:29:00
#> 6 0.179 5 2019-11-02 00:35:00
#> 7 0.391 4 2019-11-02 00:36:00
#> 8 1.09 5 2019-11-02 00:37:00
#> 9 0.119 2 2019-11-02 00:38:00
#> 10 0.949 3 2019-11-02 00:39:00
df2 <- df1 %>%
as_tsibble(index=timestamp) %>%
fill_gaps(count=0)
df2[26:35,]
#> # A tsibble: 10 x 3 [1m] <?>
#> lane count timestamp
#> <dbl> <dbl> <dttm>
#> 1 0.218 4 2019-11-02 00:25:00
#> 2 -1.63 4 2019-11-02 00:26:00
#> 3 0.603 5 2019-11-02 00:27:00
#> 4 -1.04 4 2019-11-02 00:28:00
#> 5 -0.397 5 2019-11-02 00:29:00
#> 6 NA 0 2019-11-02 00:30:00
#> 7 NA 0 2019-11-02 00:31:00
#> 8 NA 0 2019-11-02 00:32:00
#> 9 NA 0 2019-11-02 00:33:00
#> 10 NA 0 2019-11-02 00:34:00
Created on 2020-12-28 by the reprex package (v0.3.0)

Rob Hyndman
- 30,301
- 7
- 73
- 85