Use data.table to count and aggregate / summarize a column

Question

I want to count and aggregate(sum) a column in a data.table, and couldn't find the most efficient way to do this. This seems to be close to what I want R summarizing multiple columns with data.table.

My data:

set.seed(321)
dat <- data.table(MNTH = c(rep(201501,4), rep(201502,3), rep(201503,5), rep(201504,4)), 
                  VAR = sample(c(0,1), 16, replace=T))

> dat
     MNTH VAR
 1: 201501   1
 2: 201501   1
 3: 201501   0
 4: 201501   0
 5: 201502   0
 6: 201502   0
 7: 201502   0
 8: 201503   0
 9: 201503   0
10: 201503   1
11: 201503   1
12: 201503   0
13: 201504   1
14: 201504   0
15: 201504   1
16: 201504   0

I want to both count and sum VAR by MNTH using data.table. The desired result:

    MNTH COUNT VAR
1 201501     4   2
2 201502     3   0
3 201503     5   2
4 201504     4   2

I find [the accepted answer](https://stackoverflow.com/a/32826378/1048186) for this question to be easier to follow even in the SINGLE column case than the [accepted answer on the question linked](https://stackoverflow.com/questions/16513827/r-summarizing-multiple-columns-with-data-table) in this question. — Josiah Yoder, Aug 12 '21 at 20:44
I think that is because the other question is pointed to a many-column case. — Josiah Yoder, Aug 12 '21 at 21:03

Jaap · Accepted Answer · 2019-06-27T06:46:35.460

The post you are referring to gives a method on how to apply one aggregation method to several columns. If you want to apply different aggregation methods to different columns, you can do:

dat[, .(count = .N, var = sum(VAR)), by = MNTH]

this results in:

     MNTH count var
1: 201501     4   2
2: 201502     3   0
3: 201503     5   2
4: 201504     4   2

You can also add these values to your existing dataset by updating your dataset by reference:

dat[, `:=` (count = .N, var = sum(VAR)), by = MNTH]

this results in:

> dat
      MNTH VAR count var
 1: 201501   1     4   2
 2: 201501   1     4   2
 3: 201501   0     4   2
 4: 201501   0     4   2
 5: 201502   0     3   0
 6: 201502   0     3   0
 7: 201502   0     3   0
 8: 201503   0     5   2
 9: 201503   0     5   2
10: 201503   1     5   2
11: 201503   1     5   2
12: 201503   0     5   2
13: 201504   1     4   2
14: 201504   0     4   2
15: 201504   1     4   2
16: 201504   0     4   2

For further reading about how to use data.table syntax, see the Getting started guides on the GitHub wiki.

Use data.table to count and aggregate / summarize a column

1 Answers1

Linked