0

I have a datatable as follows:

library(data.table)
dt <- fread(
    "A   B   D  E  iso   year   
     1   A   1  NA ECU   2009   
     2   B   2  0  ECU   2009   
     3   D   3  0  BRA   2011   
     4   E   4  0  BRA   2011   
     5   D   7  NA ECU   2008   
     6   E   1  0  ECU   2008   
     7   A   3  2  BRA   2012   
     8   A   4  NA BRA   2012",
  header = TRUE
)

I want to create an aggregate of the dataset by doing something like: dt[, .(D = sum(D)), by = c("iso", "year")]

However, instead of only for D, I want to be able to also add A, E (and 10 more columns in the actual data).

Desired output:

dt <- fread(
    "A   D  E  iso   year   
     3   3  0  ECU   2009   
     7   7  0  BRA   2011    
    11   8  0  ECU   2008    
    15   7  2  BRA   2012",
  header = TRUE
)

How can I specify this?

Tom
  • 2,173
  • 1
  • 17
  • 44

2 Answers2

1

You can combine .SD + .SDcols and loop over .SD with lapply():

dt[, 
   lapply(.SD, sum, na.rm = TRUE), 
   by = c("iso", "year"), 
   .SDcols = c("A", "D", "E")]

   iso year  A D E
1: ECU 2009  3 3 0
2: BRA 2011  7 7 0
3: ECU 2008 11 8 0
4: BRA 2012 15 7 2
s_baldur
  • 29,441
  • 4
  • 36
  • 69
1

Using dplyr package

You can also use the dplyr package and in this case you don't need to specify all the column names:

library(dplyr)
dt$E[is.na(dt$E)] <- 0
dt %>%
  group_by(iso, year) %>%
  summarize_at(vars(A, D:E), .funs = sum)

Output:

# A tibble: 4 x 5
# Groups:   iso [2]
  iso    year     A     D     E
  <chr> <int> <int> <int> <dbl>
1 BRA    2011     7     7     0
2 BRA    2012    15     7     2
3 ECU    2008    11     8     0
4 ECU    2009     3     3     0

Hope this helps.

Louis
  • 3,592
  • 2
  • 10
  • 18