R Dataframe: Add new column as sum of certain other columns

Question

    Name    Trial#  Result  ResultsSoFar
1   Bob     1       14          14
2   Bob     2       22          36
3   Bob     3       3           39
4   Bob     4       18          57
5   Nancy   2       33          33
6   Nancy   3       87          120

Hello, say I have the dataframe above. What's the best way to generate the "ResultsSoFar" column which is a sum of that person's results up to and including that trial (Bob's results do not include Nancy's and vice versa).

You can do like this . `df %>% mutate(ResultsSofar = cumsum(Result))` — user5249203, Dec 19 '17 at 19:49
more like `unlist(aggregate(Result~Name, df, cumsum)$Result)` (or a `group_by` prior to `mutate` [tag:dplyr]) — Cath, Dec 19 '17 at 19:53
Where does the person's name come into play? Sorry if it wasn't clear but I need to sum up Bob's and Nancy's separately. — elai, Dec 19 '17 at 19:58
Possible duplicate of [Calculate cumulative sum within each ID (group)](https://stackoverflow.com/questions/16850207/calculate-cumulative-sum-within-each-id-group) — nghauran, Dec 19 '17 at 20:34

Cath · Accepted Answer · 2017-12-19T20:06:41.770

6

With data.table you can do:

library(data.table)
setDT(df)[, ResultsSoFar:=cumsum(Result), by=Name]
df
    Name Trial. Result ResultsSoFar
1:   Bob      1     14           14
2:   Bob      2     22           36
3:   Bob      3      3           39
4:   Bob      4     18           57
5: Nancy      2     33           33
6: Nancy      3     87          120

Note:
If Trial# is not sorted, you can do setDT(df)[, ResultsSoFar:=cumsum(Result[order(Trial.)]), by=Name] to get the right order for the cumsum

edited Dec 19 '17 at 20:06

answered Dec 19 '17 at 19:58

Cath

23,906
5
52
86

Jumping in, yes, you'll need to sort. Data table is good at that, too! Simply `df <- df[order(Trial.)`] . And yes, the `by=` can be a list of columns and then it will work by many columns. – HarlandMason Dec 19 '17 at 20:05
@HarlandMason you don't need to sort the df, you can keep the df as is and just order the Result variable for the cumsum, see my edit – Cath Dec 19 '17 at 20:08
is the syntax in your note preferable to `setDT(df)[order(Trial.)][, ResultsSoFar:=cumsum(Trial.), by=Name]`? – HarlandMason Dec 19 '17 at 20:09
1

@HarlandMason actually your line could be good too and give the same result but I would rather do `df[order(Trial.), ResultsSoFar:=cumsum(Trial.), by=Name]` ;-) – Cath Dec 19 '17 at 20:16
Hello, follow-up question, how could this be done to include only the previous results so far (i.e. ResultsSoFar for Bob's trial#4 would sum up trials 1 through 3, but not including itself.) – elai Dec 19 '17 at 22:00
@EdwardLai you can do (considering df is ordered by trial or specifying `order(Trial.)`) `c(head(cumsum(Result, -1), NA)` instead of `cumsum(Result)` – Cath Dec 20 '17 at 07:43

R Dataframe: Add new column as sum of certain other columns

1 Answers1