0

I have a dataframe like so:

df <- data.frame(
      COL1 = rep("28", 10),
      COL2 = c(rep("33", 6), rep("45", 4)),
      COL3 = c("10", "10", "10", "10", "23", "23", "22", "22", "22", "29")
    )
> df
#    COL1 COL2 COL3
# 1    28   33   10
# 2    28   33   10
# 3    28   33   10
# 4    28   33   10
# 5    28   33   23
# 6    28   33   23
# 7    28   45   22
# 8    28   45   22
# 9    28   45   22
# 10   28   45   29

Now I want to populate COL4 auto-incrementally based on duplicate values for COL1 to COL3. The output should look like this:

> df
#    COL1 COL2 COL3 COL4
# 1    28   33   10    1
# 2    28   33   10    2
# 3    28   33   10    3
# 4    28   33   10    4
# 5    28   33   23    1
# 6    28   33   23    2
# 7    28   45   22    1
# 8    28   45   22    2
# 9    28   45   22    3
# 10   28   45   29    1

How would I go about doing this? Any help will be appreciated.

Note: columns are not necessarily sorted.

marl1
  • 313
  • 1
  • 5

1 Answers1

1

base R

With base R, use ave:

vec <- apply(df, 1, paste, collapse = "")
df$COL4 <- ave(vec, vec, FUN = seq_along)

dplyr

With dplyr, group_by all columns and use row_number. If they are more columns to group_by, you can use group_by(across(everything())).

library(dplyr)
df %>% 
  group_by(COL1, COL2, COL3) %>% 
  mutate(COL4 = row_number())

output

   COL1  COL2  COL3   COL4
   <chr> <chr> <chr> <int>
 1 28    33    10        1
 2 28    33    10        2
 3 28    33    10        3
 4 28    33    10        4
 5 28    33    23        1
 6 28    33    23        2
 7 28    45    22        1
 8 28    45    22        2
 9 28    45    22        3
10 28    45    29        1
Maël
  • 45,206
  • 3
  • 29
  • 67
  • Wow, thanks! I like coding in base R, but have never used `ave` before. Very powerful line of code to use `ave` with the `seq_along` function! Thanks for learning me new things :) – marl1 Mar 04 '22 at 05:21