0

I'd like to create a new data frame column that helps me quickly identify duplicate rows based on the value of the first column per row (index). Assuming that my dataframe (df) has almost 18000 rows-observations and the new column is called "unique" I have tried the following rather unsuccessfully...

df$unique = ifelse(df[row.names(df):1]==df[row.names(df)-1:1], "YES", "NO")

The rationale behind the code is that a comparison between the cell of the same row and the one before in the same column, can give out unique entries as long as these values do not match.

My dataframe

index num1 num2
1     12   12
1     12   12
2     14   14
2     14   14
2     14   14
3     18   18
4     19   19
civy
  • 393
  • 2
  • 17

1 Answers1

4

You can use the duplicated function. Be aware that the first occurence of a non-unique column is not a duplicate, hence we need it twice, searching from the beginning and from the end.

# Toy data, where the first two rows are identical, the third row is unique
df <- data.frame(a = c(1, 1, 1), b = c(1, 1, 2))

# Find unique columns
df$unique <- !(duplicated(df) | duplicated(df, fromLast = TRUE))

Output:

> df
  a b unique
1 1 1  FALSE
2 1 1  FALSE
3 1 2   TRUE
Patrick Roocks
  • 3,129
  • 3
  • 14
  • 28