I have a relatively large dataframe (~2,000,000 rows) where, for each row, I need to, within that observation's group, calculate the sum of every value greater than or equal to the current row's value.
Here is a sample data frame:
sample_df = data.frame(
group_id = c(1,1,1,1,2,2,2,2),
value = c(10,12,14,12,8,8,21,10)
)
I currently have a very slow solution to do this using a loop and some filtering, however, having a faster solution would be much preferred. I've been trying to use dplyr however, I can not figure out how I can get the sum of the other observations after the data are grouped.
With the above toy example, here would be the desired output:
desired_output = data.frame(
group_id = c(1,1,1,1,2,2,2,2),
value = c(10,12,14,12,8,8,21,10),
output = c(38,26,0,26,39,39,0,21)
)
Looking around for solutions to this that have already been posted, I haven't seen a clear answer which explains how one can compare each observation in a group to the other observations, filtered on some criteria, in that group. I'd prefer a dplyr-based solution but if there are efficient base-R or data.table solutions than I'd be equally grateful!