1
    Name    Trial#  Result  ResultsSoFar
1   Bob     1       14          14
2   Bob     2       22          36
3   Bob     3       3           39
4   Bob     4       18          57
5   Nancy   2       33          33
6   Nancy   3       87          120

Hello, say I have the dataframe above. What's the best way to generate the "ResultsSoFar" column which is a sum of that person's results up to and including that trial (Bob's results do not include Nancy's and vice versa).

elai
  • 23
  • 6
  • You can do like this . `df %>% mutate(ResultsSofar = cumsum(Result))` – user5249203 Dec 19 '17 at 19:49
  • 1
    more like `unlist(aggregate(Result~Name, df, cumsum)$Result)` (or a `group_by` prior to `mutate` [tag:dplyr]) – Cath Dec 19 '17 at 19:53
  • Where does the person's name come into play? Sorry if it wasn't clear but I need to sum up Bob's and Nancy's separately. – elai Dec 19 '17 at 19:58
  • Possible duplicate of [Calculate cumulative sum within each ID (group)](https://stackoverflow.com/questions/16850207/calculate-cumulative-sum-within-each-id-group) – nghauran Dec 19 '17 at 20:34

1 Answers1

6

With you can do:

library(data.table)
setDT(df)[, ResultsSoFar:=cumsum(Result), by=Name]
df
    Name Trial. Result ResultsSoFar
1:   Bob      1     14           14
2:   Bob      2     22           36
3:   Bob      3      3           39
4:   Bob      4     18           57
5: Nancy      2     33           33
6: Nancy      3     87          120

Note:
If Trial# is not sorted, you can do setDT(df)[, ResultsSoFar:=cumsum(Result[order(Trial.)]), by=Name] to get the right order for the cumsum

Cath
  • 23,906
  • 5
  • 52
  • 86
  • Jumping in, yes, you'll need to sort. Data table is good at that, too! Simply `df <- df[order(Trial.)`] . And yes, the `by=` can be a list of columns and then it will work by many columns. – HarlandMason Dec 19 '17 at 20:05
  • @HarlandMason you don't need to sort the df, you can keep the df as is and just order the Result variable for the cumsum, see my edit – Cath Dec 19 '17 at 20:08
  • is the syntax in your note preferable to `setDT(df)[order(Trial.)][, ResultsSoFar:=cumsum(Trial.), by=Name]`? – HarlandMason Dec 19 '17 at 20:09
  • 1
    @HarlandMason actually your line could be good too and give the same result but I would rather do `df[order(Trial.), ResultsSoFar:=cumsum(Trial.), by=Name]` ;-) – Cath Dec 19 '17 at 20:16
  • Hello, follow-up question, how could this be done to include only the previous results so far (i.e. ResultsSoFar for Bob's trial#4 would sum up trials 1 through 3, but not including itself.) – elai Dec 19 '17 at 22:00
  • @EdwardLai you can do (considering df is ordered by trial or specifying `order(Trial.)`) `c(head(cumsum(Result, -1), NA)` instead of `cumsum(Result)` – Cath Dec 20 '17 at 07:43