Find unique rows in a data frame in R

Question

I'd like to create a new data frame column that helps me quickly identify duplicate rows based on the value of the first column per row (index). Assuming that my dataframe (df) has almost 18000 rows-observations and the new column is called "unique" I have tried the following rather unsuccessfully...

df$unique = ifelse(df[row.names(df):1]==df[row.names(df)-1:1], "YES", "NO")

The rationale behind the code is that a comparison between the cell of the same row and the one before in the same column, can give out unique entries as long as these values do not match.

My dataframe

index num1 num2
1     12   12
1     12   12
2     14   14
2     14   14
2     14   14
3     18   18
4     19   19

Your question isn't very clear. Please provide a reproducible example and desired output. — David Arenburg, Jul 01 '16 at 10:27

score 4 · Answer 1 · answered Jul 01 '16 at 10:34

You can use the duplicated function. Be aware that the first occurence of a non-unique column is not a duplicate, hence we need it twice, searching from the beginning and from the end.

# Toy data, where the first two rows are identical, the third row is unique
df <- data.frame(a = c(1, 1, 1), b = c(1, 1, 2))

# Find unique columns
df$unique <- !(duplicated(df) | duplicated(df, fromLast = TRUE))

Output:

> df
  a b unique
1 1 1  FALSE
2 1 1  FALSE
3 1 2   TRUE

Find unique rows in a data frame in R

1 Answers1

Linked