I'm migrating from data frames and matrices to data tables, but haven't found a solution for extracting the unique rows from a data table. I presume there's something I'm missing about the [,J]
notation, though I've not yet found an answer in the FAQ and intro vignettes. How can I extract the unique rows, without converting back to data frames?
Here is an example:
library(data.table)
set.seed(123)
a <- matrix(sample(2, 120, replace = TRUE), ncol = 3)
a <- as.data.frame(a)
b <- as.data.table(a)
# Confirm dimensionality
dim(a) # 40 3
dim(b) # 40 3
# Unique rows using all columns
dim(unique(a)) # 8 3
dim(unique(b)) # 34 3
# Unique rows using only a subset of columns
dim(unique(a[,c("V1","V2")])) # 4 2
dim(unique(b[,list(V1,V2)])) # 29 2
Related question: Is this behavior a result of the data being unsorted, as with the Unix uniq
function?