Average number of riders for all the days of a week using dplyr

Question

this is how my data looks.

# A tibble: 4,722,462 x 5
   started_at          member_casual weekday   ride_length month 
   <dttm>              <chr>         <fct>           <dbl> <fct> 
 1 2020-08-20 18:08:14 member        Thursday        0.160 August
 2 2020-08-27 18:46:04 casual        Thursday        1.15  August
 3 2020-08-26 19:44:14 casual        Wednesday       2.15  August
 4 2020-08-27 12:05:41 casual        Thursday        0.801 August
 5 2020-08-27 16:49:02 casual        Thursday        0.180 August
 6 2020-08-27 17:26:23 casual        Thursday        0.691 August
 7 2020-08-26 20:14:02 casual        Wednesday       0.333 August
 8 2020-08-26 21:59:50 casual        Wednesday       0.212 August
 9 2020-08-26 19:17:42 casual        Wednesday       0.242 August
10 2020-08-27 15:13:57 casual        Thursday        0.467 August
# ... with 4,722,452 more rows

I want to group by 'weekday' and 'member_casual' and then summarize to get the average number of riders for each day of the week i.e., for the 'Monday' and 'casual' row: (the number of times Monday & Casual appears in the data)/(the actual number of Mondays in the given timeframe). This is the closest I got to.

#finding the total number of weeks in the given time frame.
weeks_ <-as.numeric(difftime(max(df2$started_at),min(df2$started_at),units="weeks"))
#assuming there are only complete weeks
df2 %>% group_by(weekday,member_casual)%>% summarise("Average Riders"=(n()/weeks_))

This output is not exact but accurate enough since the timeframe is very large.

weekday   member_casual `Average Riders`
   <fct>     <chr>                    <dbl>
 1 Monday    casual                   4404.
 2 Monday    member                   6688.
 3 Tuesday   casual                   4279.
 4 Tuesday   member                   7289.
 5 Wednesday casual                   4434.
 6 Wednesday member                   7648.
 7 Thursday  casual                   4447.
 8 Thursday  member                   7285.
 9 Friday    casual                   5807.
10 Friday    member                   7452.
11 Saturday  casual                   9366.
12 Saturday  member                   7612.
13 Sunday    casual                   7527.
14 Sunday    member                   6331.

The logic is not clear `the number of Mondays in the given timeframe` — akrun, Sep 10 '21 at 16:48
What did you try that didn't work? What about your output here isn't what you want? — camille, Sep 10 '21 at 16:48
@camille that was just the count and not the weekly average count. — Spinach55, Sep 10 '21 at 17:11
@akrun what I meant was the actual number of Mondays that appear between the newest and oldest dates in the data frame, and I want that for each day of the week which is further grouped by 'member_casual' — Spinach55, Sep 10 '21 at 17:20
Can you update your post with a small reproducible example with expected output. Your input data showed is from the full data and expected from full. If we have a small example, it becomes easier to crosscheck — akrun, Sep 10 '21 at 17:21
`actual number of Mondays that appear between the newest and oldest dates in the data frame` confusing when you later said `I want that for each day of the week`. What is the oldest and newest when you are considering each day — akrun, Sep 10 '21 at 17:24
@akrun I apologize for making the post confusing. Edited it a little for better understanding. — Spinach55, Sep 10 '21 at 18:01
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Sep 10 '21 at 21:49

score 1 · Answer 1 · answered Sep 10 '21 at 16:52

1

library(dplyr)
df %>%
    group_by(weekday, member_casual) %>% 
    count()

  weekday   member_casual     n
  <chr>     <chr>         <int>
1 Thursday  casual            5
2 Thursday  member            1
3 Wednesday casual            4

answered Sep 10 '21 at 16:52

TarJae

72,363
6
19
66

score 1 · Answer 2 · answered Sep 10 '21 at 16:56

1

We could use

library(dplyr)
df1 %>%
   add_count(member_casual) %>%
   group_by(weekday, member_casual) %>%
   summarise(Average_Riders = n()/n, .groups = 'drop')

answered Sep 10 '21 at 16:56

akrun

874,273
37
540
662

Average number of riders for all the days of a week using dplyr

2 Answers2