With R studio I aim to accumulate data in one column (# of achievements) across different rows, subject to
- Matching row: Only rows with identical person_id should be considered
- Start date: Only rows prior to the start.date of the current row (i.e. only previous observations) should be considered
- Position of interest: Only calculate for the positions of interest (1) to reduce computing effort
See rows 4 and 7 as examples:
Person_id | Position | Start.Date | Position of interest | # of achievements | CODE REQUIRED: # of previous achievements |
---|---|---|---|---|---|
Martin123 | Student | 01/2010 | 0 | 1 | 0 |
Martin123 | Intern | 01/2012 | 0 | 1 | 0 |
Martin123 | Student | 01/2014 | 0 | 2 | 0 |
Martin123 | Employee | 01/2018 | 1 | 2 | 4 |
Martin123 | Employee | 01/2020 | 0 | 2 | 0 |
Max456 | Employee | 01/2010 | 0 | 5 | 0 |
Max456 | Employee | 01/2012 | 1 | 1 | 5 |
The traditional approaches for conditional sums appear to be too static as I was unable to adjust them in a way for them to outputs sums per row while dynamically adjusting to the person_id (I have >100k rows in my dataset so cannot be done manually).
Edit: Language clarification