Pretty embarrassed to post this because it's super basic but I can't for the life of me figure out what's wrong. I have a dataframe with dates and gender.
# A tibble: 119,186 x 2
date gender
<date> <fct>
1 2020-01-02 male
2 2020-01-02 male
3 2020-01-02 male
4 2020-01-02 female
5 2020-01-02 male
6 2020-01-02 female
7 2019-12-25 male
8 2019-12-25 male
9 2019-12-25 female
10 2019-12-25 female
# … with 119,176 more rows
I'm trying to create a dataframe that is sorted by date and gender with the count of each - here's a hypothetical
date gender count
1 2020-01-02 male 4
2 2020-01-02 female 2
3 2019-12-25 male 2
4 2019-12-25 female. 2
... etc
I have tried a bunch of things to get this to work, like counting:
> df_gender %>%
+ group_by(date) %>%
+ summarize(count = count(gender))
count.x count.freq
1 male 69217
2 female 49969
And adding a column:
date gender cases
<date> <fct> <dbl>
1 2020-01-02 male 1
2 2020-01-02 male 1
3 2020-01-02 male 1
4 2020-01-02 female 1
5 2020-01-02 male 1
6 2020-01-02 female 1
7 2019-12-25 male 1
8 2019-12-25 male 1
9 2019-12-25 female 1
10 2019-12-25 female 1
# … with 119,176 more rows
> df_gender %>%
+ group_by(date) %>%
+ summarize(count = sum(cases))
count
1 119186
I feel like this should work (and has worked in other contexts) - so I've been troubleshooting my date variable. I think it's properly formatted:
glimpse(df_gender$date)
Date[1:119186], format: "2020-01-02" "2020-01-02" "2020-01-02" "2020-01-02" "2020-01-02" ...
Vaguely losing my mind here. Thanks in advance for any help and sorry for wasting time with this!