-3

suppose I have a data.frame

df = data.frame ( 
    sample = c ( "s1","s2","s2"), 
    drug = c( "drug2" , "drug1", "drug2")
)

  sample  drug
1     s1 drug2
2     s2 drug1
3     s2 drug2

Is there any easy way to create a table counting all instances of drugs including zero hits?

ideally, something like this.

samle drug1 drug2
1    s1     0     1
2    s2     1     1
Ahdee
  • 4,679
  • 4
  • 34
  • 58

2 Answers2

3

What about base R's good old table?

table(df)
#      drug
#sample drug1 drug2
#s1     0     1
#s2     1     1

Or to get a matrix output

as.data.frame.matrix(table(df))
#   drug1 drug2
#s1     0     1
#s2     1     1
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
2

This can be done with dplyr. The latest version of dplyr (0.8.0.1 as of this writing) has a .drop=FALSE option for grouping variables that preserves empty groups. For the preservation of empty groups to work, the grouping columns must all be factor class:

library(dplyr)
library(tidyr)

df %>% 
  # Convert grouping columns to factor if they aren't already
  mutate_if(is.character, factor) %>% 
  group_by(sample, drug, .drop=FALSE) %>% 
  tally %>% 
  spread(drug, n)
  sample drug1 drug2
1 s1         0     1
2 s2         1     1

Or, to keep the output in "long" format for further processing, stop before the spread:

df %>% 
  mutate_if(is.character, factor) %>% 
  group_by(sample, drug, .drop=FALSE) %>% 
  tally
  sample drug      n
1 s1     drug1     0
2 s1     drug2     1
3 s2     drug1     1
4 s2     drug2     1

The code above will ensure that all empty group combinations are preserved. However, if you're going to spread the data to a "wide" format table, then we can take care of the missing groups in the spread step without worrying about whether group_by preserves empty groups:

df %>% 
  group_by(sample, drug) %>% 
  tally %>% 
  spread(drug, n, fill=0)
eipi10
  • 91,525
  • 24
  • 209
  • 285