I have this columns in a bigger dataset (here i just report asset "x" but there are different, hence the idea is to replicate the process for every asset):
df <- structure(list(
asset = c("x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x")
col1 = c(10, 10, -22, 11, -13, 15, -7, -10, 10, -5, 3),
cumsum(col1) = c(10, 20, -2, 9, -4, 11, 4, -6, 4, -1, 2),
class = "data.frame", row.names = c(NA, -11L)
)
I want to correct the negative number in col1 such that the cumsum(col1) becomes equal to
cumsum(col1) = c(10, 20, 0, 11, 0, 15, 8, 0, 10, 5, 8)
To get that result I need to correct the col1 number iff the negative number is bigger than the cumsum of the previous number.
For example the -22
in third position should become -20
to match the cumsum of the previous 10+10
Then the -13
should become equal to -11
and the -10
should become -8
, while the last three numbers shouldn't change since they do not cumsum to a negative outcome.
So at the end of the correction process I should get
col1 = c(10, 10, -20, 11, -11, 15, -7, -8, 10, -5, 3)
cumsum(col1) = c(10, 20, 0, 11, 0 ,15, 8, 0, 10, 5, 8)
In the process of correction I think that the mechanism should be (I don't know how to do it with R, but I get something in theoretical terms) :
group_by = each group in col1 should be defined by each col1(row) greater than the cumsum of its previous rows and restard whenever the col1(row) is greater than the previous elements cumsum
iff col1(row) is greater than the previous cumsum, correct the col1(row) with the group cumsum number with a negative sign in front
cumsum col1 and check again iff the result matches the desired output, hence there should be no negative cumsum values. The min should be equal to 0
in the original dataset I have multiple asset types, hence not only "x" but also "y", "z", and others. Furthermore I need to group_by
investors since the same situation can be applied to 4k investors. hence the real dataset is something like this:
df <- structure(list(
investor = c("1", "1", "1", "2", "2", "2", "3", "3", "4", "4", "4"),
asset = c("x", "x", "x", "x", "x", "x", "y", "y", "y", "y", "z")
col1 = c(10, 10, -22, 11, -13, 15, 9, -10, 10, -5, 3),
cumsum(col1) = c(10, 20, -2, 11, -2, 13, 9, -1, 10, 5, 3),
class = "data.frame", row.names = c(NA, -11L)
)
where i need it to become (the code should just take care of group_by(investor, asset)
)
df <- structure(list(
investor = c("1", "1", "1", "2", "2", "2", "3", "3", "4", "4", "4"),
asset = c("x", "x", "x", "x", "x", "x", "y", "y", "y", "y", "z")
col1 = c(10, 10, -20, 11, -11, 15, 9, -9, 10, -5, 3),
cumsum(col1) = c(10, 20, 0, 11, 0, 15, 9, 0, 10, 5, 3),
class = "data.frame", row.names = c(NA, -11L)
)
I wrote thinking about a dplyr
solution since I'm more confortable with that but I don't know if it is possibile to do in dplyr.
Thanks for the help!