0

I have a data frame similar to this:

data <- data.frame(
  Location = rep(letters[1:10], each = 20),
  ID = rep(1:40, each = 5)
)

I want to return a table that contains each unique Location in one column and a count of the number of unique IDs in each Location in another column, so it will look like this:

Location   Count
   a         4
   b         4
   ...      ...

Note: in my actual data set there are different numbers of IDs in each Location, and there are other variables in other columns.

What is the best way to do this?

Jaap
  • 81,064
  • 34
  • 182
  • 193
Ryan
  • 1,048
  • 7
  • 14

2 Answers2

1

The table class of objects has a as.data.frame method:

as.data.frame(table(data$Location))
   Var1 Freq
1     a   20
2     b   20
3     c   20
4     d   20
5     e   20
6     f   20
7     g   20
8     h   20
9     i   20
10    j   20
IRTFM
  • 258,963
  • 21
  • 364
  • 487
0

We can use n_distinct on the 'ID' column after grouping by 'Location'. In the example, it is all 4

library(dplyr)
data %>% 
    group_by(Location) %>%
    summarise(Count = n_distinct(ID))

If we need to add a new column, use mutate instead of summarise


With data.table, this can be done with uniqueN

library(data.table)
setDT(data)[, .(Count = uniqueN(ID)), Location]
akrun
  • 874,273
  • 37
  • 540
  • 662