2

I have timeseries (datetime, Instance, Value) with some NAs in Values. If Value for all Instance - NA for same datetime, that means gap in data collection. I need to highlight that periods.

My example script and data:

library(tidyr)
library(ggplot2)


example.data1 <- data.frame( Instance = rep("A",11),
                            datetime = seq.POSIXt(as.POSIXct("2020-12-26 10:00:00"), as.POSIXct("2020-12-26 10:00:00") + 15*10, "15 sec"),
                            Value = c(0,1,2,3,4,5,6,NA,NA,9,10)
)   

example.data2 <- data.frame( Instance = rep("B",11),
                             datetime = seq.POSIXt(as.POSIXct("2020-12-26 10:00:00"), as.POSIXct("2020-12-26 10:00:00") + 15*10, "15 sec"),
                             Value = c(1,2,NA,4,5,6,7,NA,NA,10,11)
)   

example.data3 <- data.frame( Instance = rep("C",11),
                             datetime = seq.POSIXt(as.POSIXct("2020-12-26 10:00:00"), as.POSIXct("2020-12-26 10:00:00") + 15*10, "15 sec"),
                             Value = c(2,3,4,5,NA,7,8,NA,NA,11,12)
)   

example.data <- bind_rows(example.data1, example.data2, example.data3)

ggplot (data = example.data, aes(x=datetime,y=Value, color = Instance)) + 
    geom_line(size = 1.2) +
    theme_bw()

My result picture:

enter image description here

What I really need:

enter image description here

How to reach that?

UPD.

Code is answer below doesn't work correctly. Look at that:

example.data.gap <- example.data %>%
    group_by(datetime) %>%
    summarise(is_gap = all(is.na(Value))) %>%
    # Start and End 
    mutate(xmin = lag(datetime), xmax = lead(datetime)) %>%
    filter(is_gap)

Result is 2 overlapping intervals instead of 1:

# A tibble: 2 x 4
  datetime            is_gap xmin                xmax               
  <dttm>              <lgl>  <dttm>              <dttm>             
1 2020-12-26 10:01:45 TRUE   2020-12-26 10:01:30 2020-12-26 10:02:00
2 2020-12-26 10:02:00 TRUE   2020-12-26 10:01:45 2020-12-26 10:02:15

Picture - we can see that overlaps if we use alpha:

ggplot(data = example.data, aes(x = datetime, y = Value, color = Instance)) +
    geom_line(size = 1.2) +
    geom_rect(data = example.data.gap, aes(xmin = xmin, xmax = xmax, ymin = -Inf, ymax = Inf), fill = "grey95", alpha = 0.5, inherit.aes = FALSE) +
    theme_bw()

enter image description here

Maxim
  • 301
  • 1
  • 9

3 Answers3

4

Slight mods:

example.data.gap <- example.data %>%
  group_by(datetime) %>%
  summarise(is_gap = all(is.na(Value)), .groups = "drop") %>%
  mutate(
    grp = data.table::rleid(is_gap),
    prevtime = lag(datetime),
    nexttime = lead(datetime)
  ) %>%
  filter(is_gap) %>%
  group_by(grp) %>%
  summarize(xmin = min(prevtime), xmax = max(nexttime), .groups = "drop")

ggplot(data = example.data, aes(x = datetime, y = Value, color = Instance)) +
  geom_line(size = 1.2) +
  geom_rect(data = example.data.gap, aes(xmin = xmin, xmax = xmax, ymin = -Inf, ymax = Inf), fill = "grey95", alpha = 0.5, inherit.aes = FALSE) +
  theme_bw()

enter image description here

If you don't have data.table installed, a drop-in replacement for rleid (one vector only, not as extensible as data.table::rleid) is:

my_rleid <- function(x) { r <- rle(x)$lengths; rep(seq_along(r), times = r); }
r2evans
  • 141,215
  • 6
  • 77
  • 149
3

One option would be to create a dataframe containing only the gap(s), as well as the start and end of the gaps and use geom_rect to "highlight" the gap:

library(dplyr)
library(ggplot2)

example.data <- bind_rows(example.data1, example.data2, example.data3)

example.data.gap <- example.data %>%
  group_by(datetime) %>%
  summarise(is_gap = all(is.na(Value))) %>%
  # Start and End 
  mutate(xmin = lag(datetime), xmax = lead(datetime)) %>%
  filter(is_gap)

ggplot(data = example.data, aes(x = datetime, y = Value, color = Instance)) +
  geom_line(size = 1.2) +
  geom_rect(data = example.data.gap, aes(xmin = xmin, xmax = xmax, ymin = -Inf, ymax = Inf), fill = "grey95", inherit.aes = FALSE) +
  theme_bw()

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Your code works weird if gap interval is longer than 1 sample. That case overlayed intervals are created in example.data.gap. I think this construction is incorrect: mutate(xmin = lag(datetime), xmax = lead(datetime)). We need something like rle here to merge intervals with is_gap column == true. – Maxim Dec 23 '21 at 12:53
  • @Maxim would you care adding another gap to your data according to your comment, to help make this more reproducible? – tjebo Dec 23 '21 at 13:29
  • 1
    Sorry for the late response. Had to buy the last X-mas presents. (: But thankfully @r2evans already added the missing pieces to account for intervals of more than one obs. to which I have nothing to add. – stefan Dec 23 '21 at 14:36
2

Based on Stefan's idea, but using ggforce::geom_mark_rect instead. Less data preparation needed.

You can play around with the width, but I kind of like that it doesn't fill the entire gap

example.data.gap <- example.data %>%
  group_by(datetime) %>%
  filter(all(is.na(Value)))

ylims<- range(example.data$Value, na.rm = TRUE)

ggplot(data = example.data, aes(x = datetime, y = Value)) +
  geom_line(size = 1.2, aes(color = Instance)) +
  ggforce::geom_mark_rect(data = example.data.gap, aes(x = datetime, fill = is.na(Value), 
                          y = seq(ylims[1], ylims[2], length = nrow(example.data.gap))))

enter image description here

tjebo
  • 21,977
  • 7
  • 58
  • 94