I have a data.table
that contains several columns that I need to test for equality.
library(data.table)
dt <- data.table(a = 1:5,
b = c("cat1", "cat1", "cat2", "cat4", "cat1"),
c = c("cat2", "cat1", "cat2", "cat4", "cat3"),
d = c("cat3", "cat1", "cat4", "cat4", "cat5"))
> dt
a b c d
1: 1 cat1 cat2 cat3
2: 2 cat1 cat1 cat1
3: 3 cat2 cat2 cat4
4: 4 cat4 cat4 cat4
5: 5 cat1 cat3 cat5
I want to keep rows where columns b
, c
, and d
do not have the same values. (I want to examine rows where the cats
aren't the same across all columns for that row.) In my actual problem, I have many more columns but they have similar names. This seems like it should be quite easy, but I'm coming up short.
This doesn't work.
> dt[length(unique(c(b, c, d))) > 1]
a b c d
1: 1 cat1 cat2 cat3
2: 2 cat1 cat1 cat1
3: 3 cat2 cat2 cat4
4: 4 cat4 cat4 cat4
5: 5 cat1 cat3 cat5
I thought maybe I need to make a column first and then do the filter, but making that column counting the unique number of values in each row doesn't work. It does it based on the entire set of columns.
dt[, unique_cats := length(unique(c(b, c, d)))]
dt
a b c d unique_cats
1: 1 cat1 cat2 cat3 5
2: 2 cat1 cat1 cat1 5
3: 3 cat2 cat2 cat4 5
4: 4 cat4 cat4 cat4 5
5: 5 cat1 cat3 cat5 5
Using info from this question, I have also tried to include an .I
argument in a by
clause but that doesn't work either.
> dt[,
+ unique_cats := length(unique(c(b, c, d))),
+ by = .I]
> dt
a b c d unique_cats
1: 1 cat1 cat2 cat3 5
2: 2 cat1 cat1 cat1 5
3: 3 cat2 cat2 cat4 5
4: 4 cat4 cat4 cat4 5
5: 5 cat1 cat3 cat5 5
What am I doing wrong here?