1

I have a dataframe like this:

df <- data.frame(a=c(111,111,111,222,222,222,333,333,333),
                 b=c(1,0,1,1,1,1,0,0,1))
df
    a b
1 111 1
2 111 0
3 111 1
4 222 1
5 222 1
6 222 1
7 333 0
8 333 0
9 333 1

I need to get the sum of column 'b' for each 'a':

    A B
1 111 2
2 222 3
3 333 1

How can I do that in the fastest way?

Uwe
  • 41,420
  • 11
  • 90
  • 134
Vitaliy Poletaev
  • 113
  • 1
  • 1
  • 8

4 Answers4

5
 aggregate(df$b, by=list(df$a), FUN=sum)
G5W
  • 36,531
  • 10
  • 47
  • 80
4

Generally speaking, the fastest method with large data will be to use data.table.

install.packages("data.table", type = "source",
repos = "http://Rdatatable.github.io/data.table")
library("data.table")

df <- data.frame(a=c(111,111,111,222,222,222,333,333,333),
             b=c(1,0,1,1,1,1,0,0,1))
df <- as.data.table(df)
df[, sum(b), by = a]
Andrew J. Rech
  • 456
  • 4
  • 11
  • 1
    Your last line of code doesn't yield the output the OP described. This comes pretty close: `df[, sum(b), by=a]` – bdemarest Dec 17 '16 at 00:13
1

You can use dplyr:

df %>% group_by(a) %>% summarise(.,b = sum(b))
PhilC
  • 767
  • 3
  • 8
-1

If we are using package dplyr, do we really need the code like so (as mentioned by the other PhilC):

df %>% group_by(a) %>% summarise(.,b = sum(b))?

Would this not do?

df %>% group_by(a) %>% summarise(b = sum(b))?