Counting categorical variables within group_by()

Question

I am examining conservation easement data from NCED. I have a data frame of parcels that have some repeated IDs and owners. I want to group the repeated IDs into a single row with a count of the distinct number of owners... but based on this question and answer I'm just returning a count of the number of rows of the ID.

uniqueID <- c(1:10)
parcelID <- c('a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c')
owner <- c('owner1', 'owner1', 'owner1', 'owner2', 'owner3',
           'owner2', 'owner2', 'owner2', 'owner3', 'owner1')
mydat1 <- data.frame(uniqueID, parcelID, owner)

numberOwners <- mydat1 %>% group_by(parcelID, owner) %>% tally()

My desired output would be:

  parcelID_grouped nOwners
1                a       3
2                b       1
3                c       2

both of those work... if you submit as answer I'll green check it — mikeLdub, Jan 08 '18 at 20:15

score 1 · Accepted Answer · answered Jan 08 '18 at 20:26

Using dplyr there a couple of ways to do this:

library(dplyr)

mydat1 %>% distinct(parcelID, owner) %>% count(parcelID) 

mydat1 %>% group_by(parcelID) %>% summarise(n = n_distinct(owner))

Both calls resulting in:

#   parcelID     n
# 1 a            3
# 2 b            1
# 3 c            2

score 0 · Answer 2 · answered Jan 08 '18 at 22:33

Using data.table:-

library(data.table)
setDT(mydat1)
mydat1[, uniqueID := NULL]
mydat1 <- unique(mydat1)
mydat1[, nOwners := .N, by = parcelID]
mydat1[, owner := NULL]
mydat1 <- unique(mydat1)
setnames(mydat1, "parcelID", "parcelID_grouped")

You'll get the desired output:-

     parcelID_grouped   nOwners
1:                a       3
2:                b       1
3:                c       2

Counting categorical variables within group_by()

2 Answers2