What's the best way to group by two variables in R?

Question

In the following DF, called "contribs," is there a way to quickly calculate the sums in the amount column based on state? In other words, the sum of the amount from all rows that have CA in the state column, and then all rows that have NV in the state column, etc., similar to the "groupby" method in pandas.

name,state,amount
JASON HOPE,AZ,15000
MONA GIRODET,CA,5000
SHINCHI CHIEN,NV,5000
JEFF KREITZMAN,AZ,5000
GRISELDA KREITZMAN,AZ,5000
BOB DAY,CA,5000
JAMES BRESLO,CA,5000
JOHN SCARDINO,CA,5
JOHN SCARDINO,CA,5000
ANTHONY TESORO,CA,100
ANTHONY TESORO,CA,100
ANTHONY TESORO,CA,5000

Here's the way I've been doing it:


ca_contribs <- contribs$state == "CA"
sum(ca_contribs$amount)

And then I do that for each state in the data and put the results together in a dataframe, but is there a more efficient way to do this?

akrun · Accepted Answer · 2021-05-16T03:50:59.413

We can use aggregate from base R. As there are only three columns, the remaining two columns can be specified by . in the formula method

aggregate(amount ~ ., contribs, sum)

Or another option is dplyr based which would be more flexible for both summariseing and creating/modifying columns with mutate

library(dplyr)
contribs %>%
     group_by(name, state) %>%
     summarise(amount = sum(amount, na.rm = TRUE), .groups = 'drop')

Or can use data.table methods for more efficiency

library(data.table)
setDT(contribs)[, .(amount = sum(amount, na.rm = TRUE)),.(name, state)]

What's the best way to group by two variables in R?

1 Answers1