0

For the following sample data frame, I need to find for each id - the count of distinct values for each column

df <- data.frame(id = c(2,2,3,3,3,1,1,4,4),
                         prop1 = c("A","A","B","B","B","B","B","B","C"),
                         prop2 = c(FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE),
                         prop3= c(4,4,3,3,4,5,1,5,1))
    > df
      id prop1 prop2 prop3
    1  2     A FALSE     4
    2  2     A FALSE     4
    3  3     B FALSE     3
    4  3     B FALSE     3
    5  3     B FALSE     4
    6  1     B  TRUE     5
    7  1     B FALSE     1
    8  4     B  TRUE     5
    9  4     C FALSE     1

base R preferred.

Expected Output format:

    > dfDistinctCountByProp
      id prop1.unq.cnt prop2.unq.cnt prop3.unq.cnt
    1  1        1               2         2
    2  2        1               1         1
    3  3        1               1         2
    4  4        2               2         2
user3206440
  • 4,749
  • 15
  • 75
  • 132

1 Answers1

2

You can sum the not-duplicated cases in aggregate, which lets you group by id:

aggregate(. ~ id, df, function(x){ sum(!duplicated(x)) })

##   id prop1 prop2 prop3
## 1  1     1     2     2
## 2  2     1     1     1
## 3  3     1     1     2
## 4  4     2     2     2

or use length(unique(...)) if it makes more sense to you:

aggregate(. ~ id, df, function(x){length(unique(x))})    # returns identical result

Should a reader care, in dplyr it would be

library(dplyr)

df %>% group_by(id) %>% summarise_all(n_distinct)

or data.table,

library(data.table)

setDT(df)[, lapply(.SD, uniqueN), by = id]
alistaire
  • 42,459
  • 4
  • 77
  • 117