Create id column of repeated rows

Question

EDITED:

I have a very simple question. I have a dataframe (already given) with repeated rows. I want to identify each unique row and add a column with an ID number.

The original table has thousands of row, but I simplify it here. A toy df can be created in this way.

df <- data.frame(var1 = c('a', 'a', 'a', 'b', 'c', 'c', 'a'), 
                 var2 = c('d', 'd', 'd', 'e', 'f', 'f', 'c'))

For each unique row, I want a numeric ID:

  var1 var2  ID
1    a    d   1
2    a    d   1
3    a    d   1
4    b    e   2
5    c    f   3
6    c    f   3
7    a    c   4

/EDITED

What have you tried so far? It would be helpful to see how you're approaching this since you go from 3 rows to 6 but it's unclear how that happened — camille, Feb 10 '20 at 14:13

score 2 · Answer 1 · answered Feb 10 '20 at 14:24

2

Here is a base R solution using cumsum + duplicated, i.e.,

df$ID <- cumsum(!duplicated(df))

such that

> df
  var1 var2 ID
1    a    d  1
2    a    d  1
3    a    d  1
4    b    e  2
5    c    f  3
6    c    f  3
7    a    c  4

answered Feb 10 '20 at 14:24

ThomasIsCoding

96,636
9
24
81

Ronak Shah · Answer 2 · 2020-02-10T14:56:52.777

EDIT

Well, the question was completely changed by OP. For the updated question we can do

df$ID <- match(paste0(df$var1, df$var2), unique(paste0(df$var1, df$var2)))

Original answer

One way would be to use uncount from tidyr

library(dplyr)
df %>% mutate(ID = row_number()) %>% tidyr::uncount(ID, .remove = FALSE)

#    var1 var2 ID
#1      a    d  1
#2      b    e  2
#2.1    b    e  2
#3      c    f  3
#3.1    c    f  3
#3.2    c    f  3

In base R we can create a row number column in the dataframe and repeat rows based on that.

df$ID <- seq(nrow(df))
df[rep(df$ID, df$ID), ]

data

df <- structure(list(var1 = structure(1:3, .Label = c("a", "b", "c"
), class = "factor"), var2 = structure(1:3, .Label = c("d", "e", 
"f"), class = "factor")), row.names = c(NA, -3L), class = "data.frame")

Create id column of repeated rows

2 Answers2