0

I have the following dataset

data.frame(company=c("c1","c2","c3","c2","c1","c2"),field=c("A","B","C","A","D","C"))

I am interested to know

How many different field each company have?

So, i need to have a dataframe like below

company   filds

c1          2

c2          3

c3          1
user5363938
  • 831
  • 3
  • 17
  • 32
  • 2
    Number (`length`) of fields or number of distinct (`length(unique(...))`) fields? The example is ambiguous. – alistaire May 03 '17 at 07:11

1 Answers1

1

We can use aggregate the 'field' by 'company' to find the length of 'unique' elements in each 'company'

aggregate(field~company, df1, FUN = function(x) length(unique(x)))
#   company field
#1      c1     2
#2      c2     3
#3      c3     1

Or using data.table, convert to 'data.table' (setDT(df1)), grouped by 'company', use the convenient wrapper (uniqueN i.e. length of unique)

library(data.table)
setDT(df1)[, .(fields = uniqueN(field)), company]
#   company fields
#1:      c1      2
#2:      c2      3
#3:      c3      1

Or with dplyr with n_distinct

library(dplyr)
df1 %>%
    group_by(company) %>%
    summarise(fields = n_distinct(field))

NOTE: In the example, the number of unique 'field' per 'company' and the total elements in 'company' are the same. If it is the latter, then use .N from data.table or n() from dplyr i.e.

setDT(df1)[, .(fields = .N), company]

data

df1 <- data.frame(company=c("c1","c2","c3","c2","c1","c2"),
                       field=c("A","B","C","A","D","C"))   
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662