-1

I am trying to create a new variable in my dataframe that is the group-specific sum of a variable. For example:

df <- data.frame (group  = c(1, 1, 1, 2, 2, 2),
                  variable = c(1, 2, 1, 3, 4, 5)
)
df
  group variable
1     1        1
2     1        2
3     1        1
4     2        3
5     2        4
6     2        5

I would like a new variable that sums variable by group to get something that looks like this:

 group variable sum
1     1        1   4
2     1        2   4
3     1        1   4
4     2        3  12
5     2        4  12
6     2        5  12

Thank you!

PotterFan
  • 3
  • 1
  • 3
  • 1
    This is a dupe of [many](https://stackoverflow.com/search?tab=votes&q=%5br%5d%20summarize%20by%20group), but most of those answers found will be summarizing the data (reducing the number of rows), not just adding a column to it. – r2evans Jan 04 '21 at 21:00
  • Thank you @IceCreamToucan, I knew they were there, I just ran out of time to find the right one(s). – r2evans Jan 04 '21 at 21:56

1 Answers1

3

Base R

with(df, ave(variable, group, FUN = sum))
# [1]  4  4  4 12 12 12

(Reassign into the frame with df$sum <- with(df, ...).)

dplyr

library(dplyr)
df %>%
  group_by(group) %>%
  mutate(sum = sum(variable)) %>%
  ungroup()
# # A tibble: 6 x 3
#   group variable   sum
#   <dbl>    <dbl> <dbl>
# 1     1        1     4
# 2     1        2     4
# 3     1        1     4
# 4     2        3    12
# 5     2        4    12
# 6     2        5    12

data.table

library(data.table)
DF <- as.data.table(df)
DF[, sum := sum(variable), by = .(group) ]
DF
#    group variable sum
# 1:     1        1   4
# 2:     1        2   4
# 3:     1        1   4
# 4:     2        3  12
# 5:     2        4  12
# 6:     2        5  12
r2evans
  • 141,215
  • 6
  • 77
  • 149