0

Is there an efficient way to hide blank spaces in timeseries plots using ggplot2? I've got the following graph which, as can be seen, has no data from Dec. 3 - Dec. 5. Is there a way to hide this portion of the graph?

enter image description here

I'm currently using following code to produce this graph:

ggplot(data = do.call(rbind.data.frame, combinedOutput[,2])) +
  geom_line(aes(x = Date, y = Return)) +
  geom_line(aes(x = Date, y = PredReturn), colour = "red") +
  facet_wrap(~Ticker, ncol = 2, scales = "free") +
  theme_light() + 
  theme(panel.spacing.y = unit(0.3, "cm"), 
        strip.background = element_rect(fill = "white"), 
        strip.text = element_text(colour = "black")) + 
  labs(x = NULL, y = "Daily Return in \\%")

This is how the raw data looks like. There are no NAs between 2016-12-02 16:00:00 and 2016-12-05 09:30:00.

enter image description here

Many thanks in advance!

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
rajomato
  • 1,167
  • 2
  • 10
  • 25
  • 1
    So if you want to separate your line into two lines, you can set the group. You can precompute the grouping variable or do something like this: `geom_line(aes(x = Date, y = Return, group = Date > as.Date(c("2016-12-03")) + ...` – Jack Brookes Nov 18 '18 at 22:15
  • Have you considered separating the data before the gap and after in different columns? – Samuel Nov 18 '18 at 22:15
  • 2
    Also, its generally not best practice to have two geoms for two lines. You should gather/melt your data and do a single: `geom_line(aes(x = Date, y = DailyReturn, color = ReturnType))` – Jack Brookes Nov 18 '18 at 22:17
  • Thanks for the useful insights! I thought about grouping my data as both of you suggest. However, the problem is that these gaps occur randomly over time, in which case it's quite hard for me to identify when such a gap occurs and when not. Therefore, manually grouping the data is a huge pain. – rajomato Nov 18 '18 at 22:35
  • Also, would the grouping really hide the blank space in the graph? Or would it simply eliminate the line connecting the two segments while still showing an empty space between Dec 3 and Dec 5? – rajomato Nov 18 '18 at 22:38
  • Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) – Tung Nov 19 '18 at 03:28

1 Answers1

1

I see this as first & foremost a data wrangling problem, with the ggplot part coming afterwards.

Since there's no sample data in the question, let's simulate some:

library(dplyr)

set.seed(12345)
data <- data.frame(
  Date = seq.POSIXt(from = ISOdate(2018, 1, 1),
                    to = ISOdate(2018, 5, 1),
                    by = "hour")
) %>%
  mutate(Return = rnorm(n = n()),
         PredReturn = rnorm(n = n()))
data$Date[c(220:350,
            593:820,
            2100:2500)] <- NA
data <- na.omit(data)

#which creates a dataset with 3 distinctive gaps in its time periods
ggplot(data,
       aes(x = Date, group = 1)) +
  geom_line(aes(y = Return)) +
  geom_line(aes(y = PredReturn), color = "red") +
  theme_light()

plot with time gaps

We can identify time gaps by comparing the time difference between consecutive time stamps. Here, the logic I used defines a gap as any time difference larger than the median of all time differences. You may want to change that to some other value (e.g. 2 days? 1 week?) depending on your context:

data2 <- data %>%
  arrange(Date) %>%
  mutate(date.diff = c(NA, diff(Date))) %>%
  mutate(is.gap = !is.na(date.diff) & date.diff > median(date.diff, na.rm = TRUE)) %>%
  mutate(period.id = cumsum(is.gap))

> head(data2)
                 Date     Return PredReturn date.diff is.gap period.id
1 2018-01-01 12:00:00  0.5855288 -0.7943254        NA  FALSE         0
2 2018-01-01 13:00:00  0.7094660  1.8875074         1  FALSE         0
3 2018-01-01 14:00:00 -0.1093033  0.5881879         1  FALSE         0
4 2018-01-01 15:00:00 -0.4534972  1.1556793         1  FALSE         0
5 2018-01-01 16:00:00  0.6058875 -0.8743878         1  FALSE         0
6 2018-01-01 17:00:00 -1.8179560  0.2586568         1  FALSE         0

Now each period.id value corresponds to a subset of data without major time differences within its rows. We can further wrangle this data by converting it to long format:

data2 <- data2 %>%
  select(-date.diff, -is.gap) %>% # drop unneeded columns
  tidyr::gather(color, y, -Date, -period.id) %>%
  mutate(color = factor(color,
                        levels = c("Return", "PredReturn")))

> head(data2)
                 Date period.id  color          y
1 2018-01-01 12:00:00         0 Return  0.5855288
2 2018-01-01 13:00:00         0 Return  0.7094660
3 2018-01-01 14:00:00         0 Return -0.1093033
4 2018-01-01 15:00:00         0 Return -0.4534972
5 2018-01-01 16:00:00         0 Return  0.6058875
6 2018-01-01 17:00:00         0 Return -1.8179560

Pass this data to ggplot(), facet by time periods with free scales, & you'd have eliminated the blank spaces from the earlier plot above:

p <- ggplot(data2,
       aes(x = Date, y = y, color = color)) +
  geom_line() +
  facet_grid(~ period.id, scales = "free_x", space = "free_x") +
  scale_color_manual(values = c("Return" = "black",
                                "PredReturn" = "red")) +
  theme_light()

p

faceted plot

Further tweaks to the plot's aesthetics can hide the blank spaces completely, though I'd caution against going to extremes without making the time gaps very clear to your intended audience, as this can be subject to misinterpretation:

p +
  scale_x_datetime(expand = c(0, 0),             # remove space within each panel
                   breaks = "5 days") +          # specify desired time breaks
  theme(panel.spacing = unit(0, "pt"),           # remove space between panels
        axis.text.x = element_text(angle = 90))  # rotate x-axis text

faceted plot without gaps

Z.Lin
  • 28,055
  • 6
  • 54
  • 94