I have a large dataframe and I want to check whether the values a set of (factor) variables uniquely identifies each row of the data or not.
My current strategy is to aggregate by the variables that I think are the index variables
dfAgg = aggregate(dfTemp$var1, by = list(dfTemp$var1, dfTemp$var2, dfTemp$var3), FUN = length)
stopifnot(sum(dfAgg$x > 1) == 0)
But this strategy takes forever. A more efficient method would be appreciated.
Thanks.