1

I have two variables that change over time. Addition is known, but as for substraction, I know only the sum that needs to be substracted, while I want to substract it from both variables proportionally.

Here is an example of a dataset

df = data.frame(id = c(rep(1,5), rep(2,3)),
                ord = c(1:5, 1:3),
                a = c(10, NA, 20, 0, NA, 0, 15, NA),
                b = c(0, NA, 0, 15, NA, 10, 0, NA),
                substract = c(NA, -5, NA, NA, -10, NA, NA, -15)) %>% 
  rowwise() %>% 
  mutate(all = sum(c(a, b, substract), na.rm = TRUE)) %>% 
  arrange(id, ord) %>% 
  group_by(id) %>% 
  mutate(all = cumsum(all)) %>% 
  ungroup()

So, I want to replace NA in a and b with the value from substract, multiplied by cumulative sum of a and b respectively, divided by value in all right before the NA. The problem is, that after each replacement, the next replacement should take into account all the previous replacements, as cumulative sum of a and b will change after that.

I have a solution with a while loop that works, but is highly ineffective. The original dataset is huge, so it is not an option for me, but it might give some additional insight on what I would like to achieve.

test = df %>% 
  group_by(id)
while(any(is.na(test$a))){
  test = test %>% 
    mutate(across(c("a", "b"), ~ ifelse(is.na(.x), lag(cumsum(.x)) / lag(all) * substract, .x)))
}

Could anyone suggest a more effective solution? Like, if there was any way to make mutate function save changes after each mutation (so it does not need to be put into a while loop) or something?

EDIT: user63230 suggested using recursive functions. It does seem to be what I wanted, but I still have difficulties in their application to my case. accumulate2() accepts only 3-arguments function and does not seem to work with lag() (as I need not only the previous value of the current variable), so it seems to be not enough. Maybe there is a way to make it work, but I have not discovered it yet. Any help would be very appreciated.

Darmist
  • 50
  • 9
  • 1
    I think you want a recursive function similar to https://stackoverflow.com/questions/48868104/recursive-function-using-dplyr – user63230 May 31 '22 at 19:58
  • Thanks, it seems similar to what I imagined, at least on paper. I still have troubles in understanding how to apply it to my case, but at least I have some direction now. – Darmist Jun 01 '22 at 07:50

1 Answers1

1

Using a similar approach as here, I think this would work, although not pretty:

library(dplyr)
sp <- split(df, df$id)
list_of_dfs <- lapply(sp, function(x){
  for(i in which(is.na(x$a))){
    tmp <- x[seq_len(i), ]
    x$a[i] <- tail(cumsum(tmp$a)[!is.na(cumsum(tmp$a))], 1)/tail(dplyr::lag(tmp$all), 1)*tail((tmp$substract), 1)
  }
  x
})
bind_rows(list_of_dfs)
#      id   ord     a     b substract   all
#   <dbl> <int> <dbl> <dbl>     <dbl> <dbl>
# 1     1     1 10        0        NA    10
# 2     1     2 -5       NA        -5     5
# 3     1     3 20        0        NA    25
# 4     1     4  0       15        NA    40
# 5     1     5 -6.25    NA       -10    30
# 6     2     1  0       10        NA    10
# 7     2     2 15        0        NA    25
# 8     2     3 -9       NA       -15    10

Can be repeated/automated for b if suitable?

user63230
  • 4,095
  • 21
  • 43