Counts of values dependent on two factors

Question

I try to find out whether there is a letter in column V3 which occurs in each of two factor groups V1 and V2. It will be clear what I mean with some data:

df <- structure(list(a = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 3L), b = c(4L, 5L, 5L, 6L, 6L, 5L, 6L, 6L, 6L, 
6L, 4L, 4L, 5L, 5L, 5L), d = structure(c(3L, 3L, 3L, 2L, 3L, 
2L, 1L, 4L, 2L, 3L, 4L, 1L, 1L, 4L, 3L), .Label = c("a", "b", 
"c", "d"), class = "factor")), .Names = c("V1", "V2", "V3"), row.names = c(NA, 
-15L), class = "data.frame")

df
   V1 V2 V3
1  1 4 c
2  1 5 c
3  1 5 c
4  1 6 b
5  1 6 c
6  2 5 b
7  2 6 a
8  2 6 d
9  2 6 b
10 2 6 c
11 3 4 d
12 3 4 a
13 3 5 a
14 3 5 d
15 3 5 c

Thus, for the first group V1 == 1, there are three levels of V2 = c(4, 5, 6) and in each level there is a "c" in V3. My expected output would be then something like this, setting all "c" to TRUE and the "b" in row 4 to FALSE, because it occurs not in all groups. For V1 == 2 we observe in V2 the two levels c(5, 6), and now the letter "b" in all levels. Thus "b" is here TRUE and all others (c("a", "d", "c")) not (FALSE).

   a b d     e
1  1 4 c  TRUE
2  1 5 c  TRUE
3  1 5 c  TRUE
4  1 6 b FALSE
5  1 6 c  TRUE
6  2 5 b  TRUE
7  2 6 a FALSE
8  2 6 d FALSE
9  2 6 b  TRUE
10 2 6 c FALSE
11 3 4 d  TRUE
12 3 4 a  TRUE
13 3 5 a  TRUE
14 3 5 d  TRUE
15 3 5 c FALSE

Using split() and table() I am able to find the letters occuring in all factor levels of V2 and V1.

a1 <- lapply(split(df, df$V1), function(x) names(which(apply(table(x$V3, x$V2) != 0, 1, all))))
a1
$`1`
[1] "c"

$`2`
[1] "b"

$`3`
[1] "a" "d"

Now I could split again the dataframe search for the letters and create the logical vector using something like this.

unlist(Map(function(x, y) x$V3 %in% y, split(df, df$V1), a1))
 11    12    13    14    15    21    22    23    24    25    31    32    33    34    35 
 TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE

But this is inconvenient and far away from an elegant solution. Therfore the question, which is IMO not a duplicate one.

The expected output is not that clear. Why you have FALSE for row 4 and for row 6 it is TRUE — akrun, Oct 11 '16 at 15:08
I think it's a bad idea to reuse letters both as values and as column names in an example unless your intention is to confuse. — Frank, Oct 11 '16 at 15:10
I think OP wants: `library(dplyr); df %>% group_by(a, d) %>% mutate(e = n() > 1)` - I'm not sure but, at least it gives the desired output. — Steven Beaupré, Oct 11 '16 at 15:11

Counts of values dependent on two factors

0 Answers0