11

I have a data.table (data in the following) with 10 columns (C1, ..., C10) and I want to delete duplicate rows.

I accidentally used setkey(data,C1), so now when I run unique(data) I only get unique rows based on the column C1, while I want to remove a row only if it's identical to another one on all the columns C1, ..., C10.
Is there a way to undo the setkey() operation? I found this question but it didn't help to solve my provlem.

PS: I can get around the problem by setting all columns in my data.table as keys with setkeyv(data, paste0("C", 1:10)), but this is not at all an elegant/practical solution.

Community
  • 1
  • 1
hellter
  • 944
  • 14
  • 31

1 Answers1

13

First, you can use setkey(data, NULL) to remove the key.

Second, unique.data.table has a by option which will allow you to specify on the fly which columns to use for comparison (regardless of which key is currently set):

unique(data, by = paste0("C", 1:10))

Third, instead of using setkey for many keys, use setkeyv to pass a character vector:

setkeyv(data, paste0("C", 1:10))

A thorough reading of ?setkey and ?unique.data.table can provide some more details.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198