I am working with a stack of trip data from a bikeshare system, so it will perhaps not surprise that there is no data from a certain month in 2020.
Among other things, I am making a chart of Trips per month by year, but, unlike Connecting across missing values with geom_line and connect points across selected NAs with geom_line(), instead of connecting directly across the gap (what I have right now) or leaving a discontinuity in the geom_line, I would like to have it go to 0 for the month in which the system was shut down.
A random sampling of my roughly 46K entries, sorted:
> trips.filtered %>% slice_sample(n = 10)
ID UnlockDate LockDate Member Distance Duration Bike-type UnlockYear UnlockMonth
<dbl> <date> <date> <chr> <dbl> <dbl> <chr> <int> <int>
5198 2019-04-13 2019-04-13 Go Pass 0.94 55.2 Bike 2019 04
10984 2019-08-11 2019-08-11 Day Pa~ 6.52 395.0 Pedelec 2019 08
14777 2019-10-21 2019-10-21 Annual~ 0.12 2.33 Pedelec 2019 10
19456 2020-03-25 2020-03-25 Monthl~ 3.37 32.2 Pedelec 2020 03
24730 2021-03-10 2021-03-10 Go Pass 0.08 27.0 Bike 2021 03
32213 2021-12-26 2021-12-26 Pay Pe~ 0 27.3 Bike 2021 12
37280 2022-05-14 2022-05-14 2 Hour~ 5.62 58.9 Pedelec 2022 05
38319 2022-06-05 2022-06-05 2 Hour~ 2.45 20.0 Pedelec 2022 06
40667 2022-08-15 2022-08-15 Pay Pe~ 5.79 56.6 Bike 2022 08
43880 2022-10-10 2022-10-10 Pay Pe~ 3.87 44.6 Bike 2022 10
This is how I'm currently making the chart of Trips per month by year:
ggplot(trips.filtered, aes(x = UnlockMonth, group = as_factor(UnlockYear),
color = as.factor(UnlockYear))) +
geom_line(stat = "count", linewidth = 1) +
geom_point(stat = "count", aes(shape = as.factor(UnlockYear))) +
which looks something like this:
The 2020 line (green) goes nearly to 0 in April, then rebounds in June. But what actually happened is that there were zero trips in May.
I suppose I could insert a single 'dummy' trip in May 2020, so that the line would go to 1, but is there another way to have geom_line go to 0 when there is no data for a given x like this?