Consider following sample data frame.
> ww
col1 col2
1 1 A
2 2 A
3 3 A
4 4 B
5 5 B
6 6 B
7 7 C
8 8 C
9 9 C
> dput(ww)
structure(list(col1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9), col2 = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("col1",
"col2"), row.names = c(NA, -9L), class = "data.frame")
I want to know if each category of col2
has different values in col1
or not. In the end, I want to get an answer (TRUE or FALSE). TRUE (if all categories of col2
have completely different sets of values in col1
), and FALSE (if there exists atleast 2 categories in col2
which have atleast 1 value in col1
common.
For above example, answer is TRUE since categories A, B and C don't have any value of col1
for them common. Values of col1
are 1,2,3 for A. Values of col1
are 4,5,6 for B. Values of col1
are 7,8,9 for C.
I can try splitting data the data frame by col2
and then saving values of col1
for each member then check for common values using intersect
but that is kind of lengthy and inefficient process for a large data frame. Can somebody provide me with an efficient solution? Any data table solution would also do.