3

I have a data frame:

md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = c(1,3,4,3,5,5),
      device = c(1,1,2,2,3,3))
myvars = c("a", "b", "c")
md[2,3] <- NA
md[4,1] <- NA
md

I want to count number of 5s in each column - by device. I can do it like this:

library(dplyr)
group_by(md, device) %>% 
summarise(counts.a = sum(a==5, na.rm = T),
          counts.b = sum(b==5, na.rm = T),
          counts.c = sum(c==5, na.rm = T))

However, in real life I'll have tons of variables (the length of myvars can be very large) - so that I can't specify those counts.a, counts.b, etc. manually - dozens of times.

Does dplyr allow to run the count of 5s on all myvars columns at once?

Thank you!

Rorschach
  • 31,301
  • 5
  • 78
  • 129
user2323534
  • 585
  • 1
  • 6
  • 18
  • 2
    See `?summarise_each` and http://stackoverflow.com/questions/21644848/summarizing-multiple-columns-with-dplyr?rq=1 – talat Jun 16 '15 at 15:20
  • 2
    I'm not sure how to get the names there, but this works: `md %>% group_by(device) %>% summarise_each(funs(counts=sum(.==5,na.rm=TRUE)))` – Frank Jun 16 '15 at 16:43
  • @Frank May be `md %>% group_by(device) %>% select_(.dots=myvars) %>% summarise_each(funs(counts=sum(.==5,na.rm=TRUE)))` or just `md %>% group_by(device) %>% summarise_each_(funs(counts=sum(.==5, na.rm=TRUE)), myvars)` – akrun Jun 16 '15 at 16:47
  • @akrun Still no names in the result when I run either of those (R 3.2.0, dplyr 0.4.1). Seems that `summarise_each` just ignores names inside `funs`... – Frank Jun 16 '15 at 16:50
  • @Frank Have you tried with `summarise_each_` – akrun Jun 16 '15 at 16:51
  • @akrun Yes, I tried the second version in your comment, but still see `a` `b` `c` as the columns, with `counts` appearing nowhere. – Frank Jun 16 '15 at 16:52
  • 1
    @Frank Never mind, I thought something different. I guess you are talking about `count.a`, `count.b` etc in the names, right – akrun Jun 16 '15 at 16:55

2 Answers2

3

If you care about the names starting with "counts." you could do it like this in a dplyr pipe:

md %>% 
  group_by(device) %>% 
  summarise_each_(funs(sum(.==5,na.rm=TRUE)), myvars) %>% 
  setNames(c(names(.)[1], paste0("counts.", myvars)))
#Source: local data frame [3 x 4]
#
#  device counts.a counts.b counts.c
#1      1        1        2        0
#2      2        0        1        0
#3      3        1        0        2

There's another Q&A about how one can name new columns produced by dplyr's mutate_each (which behaves the same way as summarise_each) here: mutate_each in dplyr: how do I select certain columns and give new names to mutated columns?.

talat
  • 68,970
  • 21
  • 126
  • 157
2

The melt() function from the reshape2 package could be useful in this case. You might want to try this:

 library(reshape2)
 mydf <- melt(md,id="device")
 thefives <- mydf[which(mydf$value==5),]
 print(table(thefives))

Here's the output:

, , value = 5

     variable
device a b c
     1 1 2 0
     2 0 1 0
     3 1 0 2

If required, the table format obtained from this output can be converted into a data.frame by first converting it into a matrix:

mydf <- as.data.frame(matrix(table(thefives),nrow=3))
colnames(mydf) <- c("a","b","c")
rownames(mydf) <-paste0("device_",c(1:3))
print(mydf)

This yields the following result:

         a b c
device_1 1 2 0
device_2 0 1 0
device_3 1 0 2

> class(mydf)
[1] "data.frame"
RHertel
  • 23,412
  • 5
  • 38
  • 64
  • Thank you. I know how to do it in Base R and reshape2 is a good idea too. But I want to know if it's possible to do it in dplyr. – user2323534 Jun 16 '15 at 16:33
  • 1
    Besides, the structure of the output is of table() is inconvenient. I need a data frame at the end. – user2323534 Jun 16 '15 at 16:35