have
is a 7,000-obs data frame with character vars A
and B
. There are 400 unique values for A
in total, and 6,500 unique values for B
.
obs A B
1 TJ.D KING.B
2 GRETCHEN.W TJ.D
3 GUS.G GRETCHEN.W
4 MIKEY.B GUS.G
...
Values of A
appear in B
sometimes but not always. I need to flag the values of A
that do not appear in B
. The result for the subset shown would look like this:
obs A B flg_A_only
1 TJ.D KING.B 0
2 GRETCHEN.W TJ.D 0
3 GUS.G GRETCHEN.W 0
4 MIKEY.B GUS.G 1
As MIKEY.B only appears in A
. My current approach is very inefficient:
have_As <- as.data.frame(unique(have$A), stringsAsFactors = F)
colnames(have_As) <- c("A")
have_Bs <- as.data.frame(unique(have$B), stringsAsFactors = F)
colnames(have_Bs) <- c("B")
have_As$flg_A_only <- 99
for(i in 1:nrow(have_As)){
for(j in 1:nrow(have_Bs)){
have_As$flg_A_only[i] <- if_else(have_As$A[i] == have_Bs$B[j], 1, 0)
}
}
...then I merge the have_As
and have_Bs
data frames. This seems very out of the way, and the for loop takes a long time to run. What's a simpler solution?