How to count content of desired column of the grouped values in a dataframe

Question

I have the following data frame:

testdf <- structure(list(gene = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 
1L, 1L, 1L), .Label = c("Actc1", "Cbx1"), class = "factor"), 
    p1 = structure(c(5L, 1L, 2L, 3L, 4L, 1L, 1L, 1L, 1L, 1L), .Label = c("BoneMarrow", 
    "Liver", "Pulmonary", "Umbilical", "Vertebral"), class = "factor"), 
    p2 = structure(c(1L, 1L, 1L, 1L, 1L, 5L, 2L, 3L, 4L, 1L), .Label = c("Adipose", 
    "Liver", "Pulmonary", "Umbilical", "Vertebral"), class = "factor")), .Names = c("gene", 
"p1", "p2"), class = "data.frame", row.names = c(NA, -10L))

testdf
#>     gene         p1        p2
#> 1   Cbx1  Vertebral   Adipose
#> 2   Cbx1 BoneMarrow   Adipose
#> 3   Cbx1      Liver   Adipose
#> 4   Cbx1  Pulmonary   Adipose
#> 5   Cbx1  Umbilical   Adipose
#> 6  Actc1 BoneMarrow Vertebral
#> 7  Actc1 BoneMarrow     Liver
#> 8  Actc1 BoneMarrow Pulmonary
#> 9  Actc1 BoneMarrow Umbilical
#> 10 Actc1 BoneMarrow   Adipose

What I want to do is group by gene and count the frequency of p1. Resulting in this:

Cbx1  5 #Vertebral, Bone Marrow, Liver, Pulmonary, Umbilical
Actc1 1 #Bone Marrow

I tried this but, it doesn't give what I want:

testdf %>% group_by(gene) %>% mutate(n=n())

score 3 · Answer 1 · answered Aug 03 '17 at 04:01

3

An alternative using aggregate

aggregate(p1 ~ gene, testdf, function(x) length(unique(x)))

#   gene p1
#1 Actc1  1
#2  Cbx1  5

answered Aug 03 '17 at 04:01

S Rivero

708
5
14

score 2 · Accepted Answer · answered Aug 03 '17 at 03:51

2

You can use n_distinct to count unique values:

testdf %>% group_by(gene) %>% summarise(n = n_distinct(p1))

# A tibble: 2 x 2
#    gene     n
#  <fctr> <int>
#1  Actc1     1
#2   Cbx1     5

answered Aug 03 '17 at 03:51

Psidom

209,562
33
339
356

score 1 · Answer 3 · answered Aug 03 '17 at 05:03

1

Also you can use tapply

 with(testdf,tapply(p1,gene,function(x)length(unique(x))))
  Actc1  Cbx1 
      1     5

answered Aug 03 '17 at 05:03

Onyambu

67,392
3
24
53

How to count content of desired column of the grouped values in a dataframe

3 Answers3