1

I want to take the number of unique values in a subgroup and create a column with the number of unique values filled down by the subgroup.

I've already tried this chunk of code, but it hasn't worked exactly the way I was hoping:

data$ones=1
data=transform(data,Count=ave(ones,UniqueID,FUN=sum))

This is what the data looks like:

Group UniqueID
Grp1  1-A
Grp1  1-B
Grp1  1-B
Grp1  1-C
Grp2  2-A
Grp2  2-A
Grp2  2-B
Grp3  3-A
Grp3  3-A
Grp3  3-A

This is what I want it to look like:

Group UniqueID Count
Grp1  1-A      3
Grp1  1-B      3
Grp1  1-B      3
Grp1  1-C      3
Grp2  2-A      2
Grp2  2-A      2
Grp2  2-B      2
Grp3  3-A      1
Grp3  3-A      1
Grp3  3-A      1

Any help would be appreciated.

user2813606
  • 797
  • 2
  • 13
  • 37
  • `data$count <- as.integer(with(data, ave(as.character(UniqueID), Group, FUN = function(x) length(unique(x)))))` – Ronak Shah Sep 06 '19 at 00:40

1 Answers1

1

We can use n_distinct from the dplyr package.

library(dplyr)

dat2 <- dat %>%
  group_by(Group) %>%
  mutate(Count = n_distinct(UniqueID)) %>%
  ungroup()
dat2
# # A tibble: 10 x 3
#    Group UniqueID Count
#    <chr> <chr>    <int>
#  1 Grp1  1-A          3
#  2 Grp1  1-B          3
#  3 Grp1  1-B          3
#  4 Grp1  1-C          3
#  5 Grp2  2-A          2
#  6 Grp2  2-A          2
#  7 Grp2  2-B          2
#  8 Grp3  3-A          1
#  9 Grp3  3-A          1
# 10 Grp3  3-A          1

Data

dat <- read.table(text = "Group UniqueID
Grp1  '1-A'
Grp1  '1-B'
Grp1  '1-B'
Grp1  '1-C'
Grp2  '2-A'
Grp2  '2-A'
Grp2  '2-B'
Grp3  '3-A'
Grp3  '3-A'
Grp3  '3-A'",
                  header = TRUE, stringsAsFactors = FALSE)
www
  • 38,575
  • 12
  • 48
  • 84