Count different values in a grouped by subset

Question

I have the following dataset

data.frame(company=c("c1","c2","c3","c2","c1","c2"),field=c("A","B","C","A","D","C"))

I am interested to know

How many different field each company have?

So, i need to have a dataframe like below

company   filds

c1          2

c2          3

c3          1

Number (`length`) of fields or number of distinct (`length(unique(...))`) fields? The example is ambiguous. — alistaire, May 03 '17 at 07:11

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

We can use aggregate the 'field' by 'company' to find the length of 'unique' elements in each 'company'

aggregate(field~company, df1, FUN = function(x) length(unique(x)))
#   company field
#1      c1     2
#2      c2     3
#3      c3     1

Or using data.table, convert to 'data.table' (setDT(df1)), grouped by 'company', use the convenient wrapper (uniqueN i.e. length of unique)

library(data.table)
setDT(df1)[, .(fields = uniqueN(field)), company]
#   company fields
#1:      c1      2
#2:      c2      3
#3:      c3      1

Or with dplyr with n_distinct

library(dplyr)
df1 %>%
    group_by(company) %>%
    summarise(fields = n_distinct(field))

NOTE: In the example, the number of unique 'field' per 'company' and the total elements in 'company' are the same. If it is the latter, then use .N from data.table or n() from dplyr i.e.

setDT(df1)[, .(fields = .N), company]

data

df1 <- data.frame(company=c("c1","c2","c3","c2","c1","c2"),
                       field=c("A","B","C","A","D","C"))

Count different values in a grouped by subset

1 Answers1

data