Sum column with a condition in R

Question

I have a dataframe like this:

df <- data.frame(a=c(111,111,111,222,222,222,333,333,333),
                 b=c(1,0,1,1,1,1,0,0,1))
df
    a b
1 111 1
2 111 0
3 111 1
4 222 1
5 222 1
6 222 1
7 333 0
8 333 0
9 333 1

I need to get the sum of column 'b' for each 'a':

How can I do that in the fastest way?

score 5 · Accepted Answer · answered Dec 16 '16 at 23:49

5

 aggregate(df$b, by=list(df$a), FUN=sum)

answered Dec 16 '16 at 23:49

G5W

36,531
10
47
80

Andrew J. Rech · Answer 2 · 2016-12-17T00:22:42.257

4

Generally speaking, the fastest method with large data will be to use data.table.

install.packages("data.table", type = "source",
repos = "http://Rdatatable.github.io/data.table")
library("data.table")

df <- data.frame(a=c(111,111,111,222,222,222,333,333,333),
             b=c(1,0,1,1,1,1,0,0,1))
df <- as.data.table(df)
df[, sum(b), by = a]

edited Dec 17 '16 at 00:22

answered Dec 16 '16 at 23:58

Andrew J. Rech

456
4
11

1

Your last line of code doesn't yield the output the OP described. This comes pretty close: `df[, sum(b), by=a]` – bdemarest Dec 17 '16 at 00:13

score 1 · Answer 3 · answered Dec 16 '16 at 23:56

1

You can use dplyr:

df %>% group_by(a) %>% summarise(.,b = sum(b))

answered Dec 16 '16 at 23:56

PhilC

767
3
8

score -1 · Answer 4 · answered Dec 17 '16 at 00:25

-1

If we are using package dplyr, do we really need the code like so (as mentioned by the other PhilC):

df %>% group_by(a) %>% summarise(.,b = sum(b))?

Would this not do?

df %>% group_by(a) %>% summarise(b = sum(b))?

answered Dec 17 '16 at 00:25

Satish Vadlamani

89
6

Sum column with a condition in R

4 Answers4