0

I am working with a stack of trip data from a bikeshare system, so it will perhaps not surprise that there is no data from a certain month in 2020.

Among other things, I am making a chart of Trips per month by year, but, unlike Connecting across missing values with geom_line and connect points across selected NAs with geom_line(), instead of connecting directly across the gap (what I have right now) or leaving a discontinuity in the geom_line, I would like to have it go to 0 for the month in which the system was shut down.

A random sampling of my roughly 46K entries, sorted:

> trips.filtered %>% slice_sample(n = 10)

ID  UnlockDate  LockDate    Member  Distance    Duration    Bike-type   UnlockYear  UnlockMonth
<dbl>   <date>  <date>  <chr>   <dbl>   <dbl>   <chr>   <int>   <int>
 5198 2019-04-13    2019-04-13  Go Pass     0.94   55.2  Bike   2019    04
10984 2019-08-11    2019-08-11  Day Pa~     6.52  395.0  Pedelec    2019    08
14777 2019-10-21    2019-10-21  Annual~     0.12    2.33 Pedelec    2019    10
19456 2020-03-25    2020-03-25  Monthl~     3.37   32.2  Pedelec    2020    03
24730 2021-03-10    2021-03-10  Go Pass     0.08   27.0  Bike   2021    03
32213 2021-12-26    2021-12-26  Pay Pe~     0      27.3  Bike   2021    12
37280 2022-05-14    2022-05-14  2 Hour~     5.62   58.9  Pedelec    2022    05
38319 2022-06-05    2022-06-05  2 Hour~     2.45   20.0  Pedelec    2022    06
40667 2022-08-15    2022-08-15  Pay Pe~     5.79   56.6  Bike   2022    08
43880 2022-10-10    2022-10-10  Pay Pe~     3.87   44.6  Bike   2022    10

This is how I'm currently making the chart of Trips per month by year:

ggplot(trips.filtered, aes(x = UnlockMonth, group = as_factor(UnlockYear), 
                           color = as.factor(UnlockYear))) +
  geom_line(stat = "count", linewidth = 1) +
  geom_point(stat = "count", aes(shape = as.factor(UnlockYear))) +

which looks something like this:

The 2020 line (green) goes nearly to 0 in April, then rebounds in June. But what actually happened is that there were zero trips in May.

I suppose I could insert a single 'dummy' trip in May 2020, so that the line would go to 1, but is there another way to have geom_line go to 0 when there is no data for a given x like this?

kjetil b halvorsen
  • 1,206
  • 2
  • 18
  • 28
  • What about precomputing the count, and then adding a zero for that may? – kjetil b halvorsen Mar 18 '23 at 00:51
  • The issue here is that for your purposes, the month of May != NA, but in your dataset May == NA. As you've surmised, add an instance of May with a value of zero. Much easier than developing a workaround. – L Tyrone Mar 18 '23 at 02:02
  • Perhaps you need something like `library(dplyr); library(tidyr); trips.filtered %>% count(UnlockYear, UnlockMonth) %>% complete(UnlockYear, UnlockMonth, fill = list(n=0))` and then plot that? – Jon Spring Mar 18 '23 at 03:37
  • Thank you @JonSpring, that did exactly what I wanted. – infinitebuffalo Mar 20 '23 at 13:32

0 Answers0