0

EDITED:

I have a very simple question. I have a dataframe (already given) with repeated rows. I want to identify each unique row and add a column with an ID number.

The original table has thousands of row, but I simplify it here. A toy df can be created in this way.

df <- data.frame(var1 = c('a', 'a', 'a', 'b', 'c', 'c', 'a'), 
                 var2 = c('d', 'd', 'd', 'e', 'f', 'f', 'c'))

For each unique row, I want a numeric ID:

  var1 var2  ID
1    a    d   1
2    a    d   1
3    a    d   1
4    b    e   2
5    c    f   3
6    c    f   3
7    a    c   4

/EDITED

Cœur
  • 37,241
  • 25
  • 195
  • 267
Marcos
  • 31
  • 6

2 Answers2

2

Here is a base R solution using cumsum + duplicated, i.e.,

df$ID <- cumsum(!duplicated(df))

such that

> df
  var1 var2 ID
1    a    d  1
2    a    d  1
3    a    d  1
4    b    e  2
5    c    f  3
6    c    f  3
7    a    c  4
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
1

EDIT

Well, the question was completely changed by OP. For the updated question we can do

df$ID <- match(paste0(df$var1, df$var2), unique(paste0(df$var1, df$var2)))

Original answer

One way would be to use uncount from tidyr

library(dplyr)
df %>% mutate(ID = row_number()) %>% tidyr::uncount(ID, .remove = FALSE)

#    var1 var2 ID
#1      a    d  1
#2      b    e  2
#2.1    b    e  2
#3      c    f  3
#3.1    c    f  3
#3.2    c    f  3

In base R we can create a row number column in the dataframe and repeat rows based on that.

df$ID <- seq(nrow(df))
df[rep(df$ID, df$ID), ]

data

df <- structure(list(var1 = structure(1:3, .Label = c("a", "b", "c"
), class = "factor"), var2 = structure(1:3, .Label = c("d", "e", 
"f"), class = "factor")), row.names = c(NA, -3L), class = "data.frame")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213