I'm looking for a tidy solution preferably using tidyverse
This question is in line with this answer, it does however have an added twist. My data has an overall grouping variable 'grp'. Within each such group, I want to perform calculations based on cumulative sum (cumsum
) within sub-groups defined by 'trial', here X
and Y
.
However, for the calculations within both sub-groups, trial "X" and trial "Y", I need to use a single, common group-specific baseline, i.e. where trial is B
.
My desired outcome is Value3
in the data set desired_outcome
below:
# library(tidyverse)
# library(dplyr)
desired_outcome # see below I got this `desired_outcome`
# A tibble: 10 x 6
# Groups: grp [2]
grp trial yr value1 value2 Value3
<chr> <fct> <dbl> <dbl> <dbl> <dbl>
1 A B 2021 2 0 2
2 A X 2022 3 1 5
3 A X 2023 4 2 10
4 A Y 2022 5 3 7
5 A Y 2023 6 4 16
6 B B 2021 0 2 0
7 B X 2022 1 3 3
8 B X 2023 2 4 8
9 B Y 2022 3 5 5
10 B Y 2023 4 6 14
My minimal working example. Data first,
tabl <- tribble(~grp, ~trial, ~yr, ~value1, ~value2,
'A', "B", 2021, 2, 0,
'A', "X", 2022, 3, 1,
'A', "X", 2023, 4, 2,
'A', "Y", 2022, 5, 3,
'A', "Y", 2023, 6, 4,
'B', "B", 2021, 0, 2,
'B', "X", 2022, 1, 3,
'B', "X", 2023, 2, 4,
'B', "Y", 2022, 3, 5,
'B', "Y", 2023, 4, 6) %>%
mutate(trial = factor(trial, levels = c("B", "X", "Y"))) %>%
arrange(grp, trial, yr)
Now, I need to use group_by()
, but I can't group on trial
as I need to use the baseline, B
in calculations for both "X" and "Y".
undesired_outcome_tidier_code <- tabl %>%
group_by(grp) %>% # this do not work!
mutate(Value1.1 = cumsum(value1),
Value2.1 = lag(cumsum(value2), default = 0),
Value3 = Value1.1 + Value2.1) %>%
select(-Value1.1, -Value2.1)
In undesired_outcome_tidier_code
row 4-5 and 9-10 is, for obvious reasons, not using line 1 and 6, respectively, as base line. As shown here,
undesired_outcome_tidier_code
# A tibble: 10 x 6
# Groups: grp [2]
grp trial yr value1 value2 Value3
<chr> <fct> <dbl> <dbl> <dbl> <dbl>
1 A B 2021 2 0 2
2 A X 2022 3 1 5
3 A X 2023 4 2 10
4 A Y 2022 5 3 17
5 A Y 2023 6 4 26
6 B B 2021 0 2 0
7 B X 2022 1 3 3
8 B X 2023 2 4 8
9 B Y 2022 3 5 15
10 B Y 2023 4 6 24
I am looking for a solution that gets me desired_outcome
(see below) in a tidy way.
I can, in this smaller example, work my way around it, to get to my desired_outcome
, but it's a cumbersome two step solution. There must be a better/tidier way.
step1 <- tabl %>% arrange(grp, trial, yr) %>% filter(trial != 'Y') %>%
group_by(grp) %>%
mutate(Value1.1 = cumsum(value1),
Value2.1 = lag(cumsum(value2), default = 0),
Value3 = Value1.1 + Value2.1)
step2 <- tabl %>% arrange(grp, trial, yr) %>% filter(trial != 'X') %>%
group_by(grp) %>%
mutate(Value1.1 = cumsum(value1),
Value2.1 = lag(cumsum(value2), default = 0),
Value3 = Value1.1 + Value2.1)
desired_outcome <- rbind(step1,
step2 %>% filter(trial != 'B')
) %>% select(-Value1.1, -Value2.1) %>% arrange(grp, trial, yr)