In R: Create a column with unique values for each cluster in another column by grouping variable

Question

I have a dataframe with the following data structure:

x <- c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "C", "C", "C", "C", "C", "C", "C", "C", "C")
y <- c("Y", "Y", "Y", "Y", "N", "N", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "N", "Y", "Y", "Y", "N", "Y", "Y")
df <- data.frame(x, y)

I want to create a new column with unique values for each chunk of Y's in column 'y' and a value for each N in 'y' using dplyr, grouping by 'x'. For example:

z <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 3, 3)
df <- data.frame(x, y, z)

How would I do this?

I tried modifications to the answer to this question How to assign a unique ID number to each group of identical values in a column to no avail.

score 0 · Answer 1 · answered Dec 14 '21 at 04:51

In the following code, I can generate the same output when x == C, but when x == A, the output is different than your example. I am wondering if your example output has some issues as A and C seem to be based on different rules. Please verify what rules you are using.

library(dplyr)

df2 <- df %>%
  group_by(x) %>%
  mutate(z = lag(cumsum(y %in% "N"), default = 0) + 1) %>%
  mutate(z = ifelse(z == 0, z + 1, z)) %>%
  ungroup()
df2
# # A tibble: 20 x 3
#    x     y         z
#    <chr> <chr> <dbl>
#  1 A     Y         1
#  2 A     Y         1
#  3 A     Y         1
#  4 A     Y         1
#  5 A     N         1
#  6 A     N         2
#  7 A     Y         3
#  8 A     Y         3
#  9 A     Y         3
# 10 A     Y         3
# 11 A     Y         3
# 12 C     Y         1
# 13 C     Y         1
# 14 C     N         1
# 15 C     Y         2
# 16 C     Y         2
# 17 C     Y         2
# 18 C     N         2
# 19 C     Y         3
# 20 C     Y         3

In R: Create a column with unique values for each cluster in another column by grouping variable

1 Answers1