I have a very large dataset that looks simplified like this:
row. member_id entry_id comment_count timestamp
1 1 a 4 2008-06-09 12:41:00
2 1 b 1 2008-07-14 18:41:00
3 1 c 3 2008-07-17 15:40:00
4 2 d 12 2008-06-09 12:41:00
5 2 e 50 2008-09-18 10:22:00
6 3 f 0 2008-10-03 13:36:00
I now want to create new a column, in which I sum up all the "commen_count" of previous ideas ("ID") from the same member. So I only want to sum up the comment_counts from entried that occured before the current entry. I can order my dataset by the member id and the timestamp.
The result should look something like this:
row. member_id entry_id comments_count timestamp aggregated_count
1 1 a 4 2008-06-09 12:41:00 4
2 1 b 1 2008-07-14 18:41:00 5
3 1 c 3 2008-07-17 15:40:00 8
4 2 d 12 2008-06-09 12:41:00 12
5 2 e 50 2008-09-18 10:22:00 62
6 3 f 0 2008-10-03 13:36:00 0
Some idea how I can do this in R (or Stata)? I tried aggregate, but I don't understand how to sum only the comment_counts before the curren timestamp and only those with the current member_id.