I have a tibble that looks like this:
# A tibble: 5 × 12
geo_ID age_group sex rep_date ref_date ref_rep_date new_case new_death new_recovery amount_case amount_death amount_recovery
<dbl> <fct> <fct> <date> <date> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2000 A15-A34 M 2020-09-24 2020-09-18 TRUE 0 -9 0 1 0 1
2 2000 A35-A59 W 2022-03-30 2022-03-24 TRUE 0 -9 -9 1 0 0
3 2000 A15-A34 W 2021-08-03 2021-07-28 TRUE 0 -9 0 3 0 3
4 2000 A05-A14 W 2021-10-14 2021-10-12 TRUE 0 -9 0 1 0 1
5 2000 A35-A59 M 2020-11-06 2020-11-01 TRUE 0 -9 0 4 0 4
I then used dplyr
to do the following:
inf_HH %>% select(ref_date, new_case, starts_with("amount")) %>%
filter(new_case != 0 & amount_death == 0 & amount_recovery == 0) %>%
select(ref_date, ends_with("case")) %>% arrange(desc(ref_date))
and get this tibble out (with %>% slice_sample(n = 20)
appended):
# A tibble: 20 × 3
ref_date new_case amount_case
<date> <dbl> <dbl>
1 2022-04-02 -1 -1
2 2022-04-07 1 58
3 2022-04-07 -1 -1
4 2022-04-12 1 1
5 2022-04-04 1 2
6 2022-04-07 1 10
7 2022-04-12 1 2
8 2022-04-11 1 23
9 2022-04-10 1 1
10 2022-04-12 -1 -1
11 2022-04-10 1 1
12 2022-04-03 1 1
13 2022-04-04 1 1
14 2022-04-13 1 8
15 2022-04-07 1 41
16 2022-04-01 1 1
17 2022-04-08 1 5
18 2022-04-04 1 1
19 2022-04-09 1 1
20 2022-04-07 1 1
Without %>% slice_sample(n = 20)
it has 183 rows, where ref_date
has 19 unique values.
What I am trying to achieve is to sum up all values in amount_case
for each unique value in ref_date
.
What I have been trying to do is use distinct
to make progress:
inf_HH %>% select(ref_date, new_case, starts_with("amount")) %>%
filter(new_case != 0 & amount_death == 0 & amount_recovery == 0) %>%
select(ref_date, ends_with("case")) %>% arrange(desc(ref_date)) %>%
distinct(ref_date, .keep_all = TRUE)
What I get is:
# A tibble: 19 × 3
ref_date new_case amount_case
<date> <dbl> <dbl>
1 2022-04-14 1 10
2 2022-04-13 1 28
3 2022-04-12 -1 -2
4 2022-04-11 -1 -1
5 2022-04-10 -1 -1
6 2022-04-09 1 1
7 2022-04-08 1 6
8 2022-04-07 1 1
9 2022-04-06 1 1
10 2022-04-05 1 2
11 2022-04-04 1 1
12 2022-04-03 1 1
13 2022-04-02 -1 -1
14 2022-04-01 1 1
15 2022-03-30 1 1
16 2022-03-28 1 1
17 2022-03-24 1 1
18 2022-03-22 -1 -1
19 2022-03-21 1 1
I would like to extend the dplyr-piping so that I get a tibble like the last one above but with the sums of all the values in amount_case
for each unique value in ref_date
.
Another way, which I cannot express in R, would be to somehow use group_by()
with summarise()
but I cannot find a way to use them to group rows AND add the values in amount_case
.
Preferably this should happen with tidyverse
/ dplyr
packages, which I am sort of comfortable with. I need to do the same operation on hundreds of other datasets, with similar structure, so it is out of question to address the unique values of ref_date
manually.