adjust the elements of a column to get a cumsum equal to zero

Question

I have this columns in a bigger dataset (here i just report asset "x" but there are different, hence the idea is to replicate the process for every asset):

df <- structure(list(
        asset = c("x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x")
        col1 =  c(10, 10, -22, 11, -13, 15, -7, -10, 10, -5, 3),  
        cumsum(col1) = c(10, 20, -2, 9, -4, 11, 4, -6, 4, -1, 2), 
        class = "data.frame", row.names = c(NA, -11L)
     )

I want to correct the negative number in col1 such that the cumsum(col1) becomes equal to

cumsum(col1) = c(10, 20, 0, 11, 0, 15, 8, 0, 10, 5, 8)

To get that result I need to correct the col1 number iff the negative number is bigger than the cumsum of the previous number. For example the -22 in third position should become -20 to match the cumsum of the previous 10+10 Then the -13 should become equal to -11 and the -10 should become -8, while the last three numbers shouldn't change since they do not cumsum to a negative outcome.

So at the end of the correction process I should get

col1 = c(10, 10, -20, 11, -11, 15, -7, -8, 10, -5, 3)
cumsum(col1) = c(10, 20, 0, 11, 0 ,15, 8, 0, 10, 5, 8)

In the process of correction I think that the mechanism should be (I don't know how to do it with R, but I get something in theoretical terms) :

group_by = each group in col1 should be defined by each col1(row) greater than the cumsum of its previous rows and restard whenever the col1(row) is greater than the previous elements cumsum
iff col1(row) is greater than the previous cumsum, correct the col1(row) with the group cumsum number with a negative sign in front
cumsum col1 and check again iff the result matches the desired output, hence there should be no negative cumsum values. The min should be equal to 0

in the original dataset I have multiple asset types, hence not only "x" but also "y", "z", and others. Furthermore I need to group_by investors since the same situation can be applied to 4k investors. hence the real dataset is something like this:

df <- structure(list(
        investor = c("1", "1", "1", "2", "2", "2", "3", "3", "4", "4", "4"),
        asset = c("x", "x", "x", "x", "x", "x", "y", "y", "y", "y", "z")
        col1 =  c(10, 10, -22, 11, -13, 15, 9, -10, 10, -5, 3),  
        cumsum(col1) = c(10, 20, -2, 11, -2, 13, 9, -1, 10, 5, 3), 
        class = "data.frame", row.names = c(NA, -11L)
     )

where i need it to become (the code should just take care of group_by(investor, asset))

df <- structure(list(
        investor = c("1", "1", "1", "2", "2", "2", "3", "3", "4", "4", "4"),
        asset = c("x", "x", "x", "x", "x", "x", "y", "y", "y", "y", "z")
        col1 =  c(10, 10, -20, 11, -11, 15, 9, -9, 10, -5, 3),  
        cumsum(col1) = c(10, 20, 0, 11, 0, 15, 9, 0, 10, 5, 3), 
        class = "data.frame", row.names = c(NA, -11L)
     )

I wrote thinking about a dplyr solution since I'm more confortable with that but I don't know if it is possibile to do in dplyr.

Thanks for the help!

akrun · Accepted Answer · 2022-06-20T15:11:07.443

We may do this with accumulate

library(dplyr)
library(purrr)
df %>% 
   group_by(asset) %>%
   mutate(col2csum = accumulate(col1,  ~ if(abs(.x + .y) < abs(.y)) 0 else 
       .x + .y)) %>% 
   ungroup

-output

# A tibble: 11 × 3
   asset  col1 col2csum
   <chr> <dbl>    <dbl>
 1 x        10       10
 2 x        10       20
 3 x       -22        0
 4 x        11       11
 5 x       -13        0
 6 x        15       15
 7 x        -7        8
 8 x       -10        0
 9 x        10       10
10 x        -5        5
11 x         3        8

Update

If we want to change the 'col1'

df %>% 
   group_by(asset) %>%
   mutate(col2csum = accumulate(col1,  ~ if(abs(.x + .y) < abs(.y)) 0 else 
       .x + .y), col1 = c(first(col2csum), diff(col2csum))) %>% ungroup

-output

# A tibble: 11 × 3
   asset  col1 col2csum
   <chr> <dbl>    <dbl>
 1 x        10       10
 2 x        10       20
 3 x       -20        0
 4 x        11       11
 5 x       -11        0
 6 x        15       15
 7 x        -7        8
 8 x        -8        0
 9 x        10       10
10 x        -5        5
11 x         3        8

data

df <- structure(list(asset = c("x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x"), col1 = c(10, 10, -22, 11, -13, 15, -7, -10, 10, 
-5, 3)), class = "data.frame", row.names = c(NA, -11L))

this is nice, it solves the problem for cumsum, but still leaves me with the col1 as it is, while I need also a procedure to correct all the numbers in col1 that led the cumsum to a negative value. Is there a way to fix that? — Lorenzo Mazzucchelli, Jun 20 '22 at 08:50
i thought about ==> `mutate(col1 = ifelse(col2cumsum < 0, -lag(col2cumsum), col1)` which should answer the problem i think? — Lorenzo Mazzucchelli, Jun 20 '22 at 08:54

adjust the elements of a column to get a cumsum equal to zero

1 Answers1

Update

data