R: Roll up column values containing NA's by sum while grouping by ID's

Question

I have a data frame that I got from

ID <- c("A","A","A","A","B","B","B","B") 
Type <- c(45,45,46,46,45,45,46,46)
Point_A <- c(10,NA,30,40,NA,80,NA,100) 
Point_B <- c(NA,32,43,NA,65,11,NA,53)
df <- data.frame(ID,Type,Point_A,Point_B)

    ID  Type    Point_A Point_B
1   A   45        10    NA
2   A   45        NA    32
3   A   46        30    43
4   A   46        40    NA
5   B   45        NA    65
6   B   45        80    11
7   B   46        NA    NA
8   B   46       100    53

While I learnt from this post, I could roll up the data with ID and one column.

I am currently using sqldf to sum the rows and group by ID and Type. While this does the job for me, its very slow on a bigger dataset.

    df1 <- sqldf("SELECT ID, Type, Sum(Point_A) as Point_A, Sum(Point_A) as Point_A 
                  FROM df 
                  GROUP BY ID, Type")

Please suggest the usage of any other techniques that would solve this problem. I have started learning dplyr & plyr packages and I find it very interesting but not knowing how to apply it here.

Desired Output

    ID  Type    Point_A Point_B
1   A   45        10    32
2   A   46        70    43
3   B   45        80    76
4   B   46       100    53

score 9 · Answer 1 · edited May 15 '15 at 07:14

9

library(data.table)

DT <- as.data.table(df)
DT[, lapply(.SD, sum, na.rm=TRUE), by=list(ID, Type)]

   ID Type Point_A Point_B
1:  A   45      10      32
2:  A   46      70      43
3:  B   45      80      76
4:  B   46     100      53

edited May 15 '15 at 07:14

Arun

116,683
26
284
387

answered May 14 '15 at 23:00

Ricardo Saporta

54,400
17
144
178

Ricardo, It works like charm :) but I prefer the usage of dplyr over data.table since I am currently learning that. – Sharath May 14 '15 at 23:20

Steven Beaupré · Accepted Answer · 2015-05-15T10:26:25.160

4

Using dplyr:

df %>% group_by(ID, Type) %>% summarise_each(funs(sum(., na.rm = T)))

Or

df %>% 
  group_by(ID, Type) %>% 
  summarise(Point_A = sum(Point_A, na.rm = T), 
            Point_B = sum(Point_B, na.rm = T))

Or

f <- function(x) sum(x, na.rm = T) 

df %>% 
  group_by(ID, Type) %>% 
  summarise(Point_A = f(Point_A), 
            Point_B = f(Point_B))

Which gives:

#Source: local data frame [4 x 4]
#Groups: ID
#
#  ID Type Point_A Point_B
#1  A   45      10      32
#2  A   46      70      43
#3  B   45      80      76
#4  B   46     100      53

edited May 15 '15 at 10:26

answered May 14 '15 at 23:12

Steven Beaupré

21,343
7
57
77

Steven, thanks again for helping me with this question too. This dplyr package amazes me every time. :-) It is super fast on a larger data set. – Sharath May 14 '15 at 23:22
1

Why not `summarise_each()`?? – Arun May 15 '15 at 07:15
@Arun I was answering similar questions with little variants and forgot to add the simplest method. – Steven Beaupré May 15 '15 at 10:37

R: Roll up column values containing NA's by sum while grouping by ID's

2 Answers2