41

Say I have a dataframe df with two or more columns, is there an easy way to use unique() or other R function to create a subset of unique combinations of two or more columns?

I know I can use sqldf() and write an easy "SELECT DISTINCT var1, var2, ... varN" query, but I am looking for an R way of doing this.

It occurred to me to try ftable coerced to a dataframe and use the field names, but I also get the cross tabulations of combinations that don't exist in the dataset:

uniques <- as.data.frame(ftable(df$var1, df$var2))
zx8754
  • 52,746
  • 12
  • 114
  • 209
wahalulu
  • 1,447
  • 2
  • 17
  • 23

4 Answers4

62

unique works on data.frame so unique(df[c("var1","var2")]) should be what you want.

Another option is distinct from dplyr package:

df %>% distinct(var1, var2) # or distinct(df, var1, var2)

Note:

For older versions of dplyr (< 0.5.0, 2016-06-24) distinct required additional step

df %>% select(var1, var2) %>% distinct

(or oldish way distinct(select(df, var1, var2))).

Marek
  • 49,472
  • 15
  • 99
  • 121
27

@Marek's answer is obviously correct, but may be outdated. The current dplyrversion (0.7.4) allows for an even simpler code:

Simply use:

df %>% distinct(var1, var2)

If you want to keep all columns, add

df %>% distinct(var1, var2, .keep_all = TRUE)
tjebo
  • 21,977
  • 7
  • 58
  • 94
5

To KEEP all other variables in df use this:

unique_rows <- !duplicated(df[c("var1","var2")])

unique.df <- df[unique_rows,]

Another less recommended method is using row.names() #(see David's comment below):

unique_rows <- row.names(unique(df[c("var1","var2")]))

unique.df <- df[unique_rows,]
sbaniwal
  • 337
  • 4
  • 6
2

In addition to answers above, the data.table version:

setDT(df)

unique_dt = unique(df, by = c('var1', 'var2'))
Zaki
  • 131
  • 3