I am at a total loss for this one - I am playing with the "pedestrian" dataset from tsibble
. I want to get total counts for each month/year. I started by adding a month_year
column, then summarise the data with sum, like so:
library("tidyverse")
library("tsibble")
df1 <- pedestrian
df1$month_year <- format(as.Date(df1$Date), "%Y-%m")
count_all <- df1 %>%
dplyr::group_by(month_year) %>%
dplyr::summarise(total = sum(Count))
A summary of count_all looks like this:
month_year Date_Time total
Length:17542 Min. :2015-01-01 00:00:00.0 Min. : 12
Class :character 1st Qu.:2015-07-02 17:15:00.0 1st Qu.: 349
Mode :character Median :2016-01-01 11:30:00.0 Median : 2090
Mean :2016-01-01 11:44:40.2 Mean : 2593
3rd Qu.:2016-07-02 04:45:00.0 3rd Qu.: 4455
Max. :2016-12-31 23:00:00.0 Max. :15990
Why is Date_Time
being retained? And how can I prevent it form impacting the summary (as in prevent it from giving me 17,542 rows instead of the expected 24). If I remove the column before the summary like so:
df1$Date_Time <- NULL
Then it works fine, and a summary of the result looks like this:
month_year total
Length:24 Min. :1148276
Class :character 1st Qu.:1756898
Mode :character Median :1927154
Mean :1895161
3rd Qu.:2066043
Max. :2393675
This solution is fine, but I would like to know what the cause of the issue is so that I can avoid it in future (it was easy to catch the problem this time, but may not always be so straight forward).
Thanks in advance for the help!