I have data of which I want to group by, perform calculations then the final result, use that for calculations in the next group.
We use conditional statements and perform calculations per group for example:
# Example Data
condition <- c(0,0,0,1,1,1,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,1,0)
col_a <- c(0,0,0,2,3,4,0,0,0,2,4,5,6,0,0,0,0,0,0,0,0,1,2,0)
col_b <- c(0,0,0,10,131,14,0,0,0,22,64,75,96,0,0,0,0,0,0,0,0,41,52,0)
df <- data.frame(condition,col_a,col_b)
This is the code to do the calculations, group by RunID
# Group by RunID
# Perform calculations
# Last value, brought forward to next group
require(dplyr)
output <- df %>%
dplyr::mutate(RunID = data.table::rleid(condition)) %>%
group_by(RunID) %>%
dplyr::mutate(calculation = ifelse(condition == 0,0, ifelse(row_number() == n(),first(col_a) * last(col_b),0))) %>%
dplyr::mutate(last.tag = ifelse(condition == 0,0, ifelse(row_number() == n(),2,0))) %>% # Add helper ID no. to aid with for loop below in answer ungroup() %>% select(-RunID) output <- data.frame(output) head(output,15)
With the output:
condition col_a col_b calculation
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 1 2 10 0
5 1 3 131 0
6 1 4 14 28
7 0 0 0 0
8 0 0 0 0
9 0 0 0 0
10 1 2 22 0
11 1 4 64 0
12 1 5 75 0
13 1 6 96 192
14 0 0 0 0
15 0 0 0 0
What I want to do is. On the first outcome, the result in calculation column is 28. I want to carry that value over to the next group and insert in col_a, row number 10 (28 replaces, 2). Then as that value is updated. The second group calculation result will be 96 * 28 = 2688 versus (96*2 = 192)
The carry forward will always insert on the first row of each group as example above.
Expected output:
condition col_a col_b calculation
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 1 2 10 0
5 1 3 131 0
6 1 4 14 28
7 0 0 0 0
8 0 0 0 0
9 0 0 0 0
10 1 28 22 0
11 1 4 64 0
12 1 5 75 0
13 1 6 96 2688
14 0 0 0 0
15 0 0 0 0
Other solution:
I subsetted to remove all 0,s. Added a 2 number identified to bottom of each group to make a continuous run, then use a for loop to go through grab and replace. Probably not most elegant but seems to work:
# Subset to remove all 0
subset.no.zero <- subset(output,condition >0)
# Loop to move values
for (i in 1:nrow(subset.no.zero)) {
temp <- ifelse(subset.no.zero$last.tag[i-1] == 2, subset.no.zero$calculation[i-1],subset.no.zero$col_a[i])
subset.no.zero$new_col_a[i] <- data.frame(temp)
}
# Re join by index no.
final_out <- full_join(output,subset.no.zero, by="index")