0

have is a 7,000-obs data frame with character vars A and B. There are 400 unique values for A in total, and 6,500 unique values for B.

obs A           B
  1 TJ.D        KING.B
  2 GRETCHEN.W  TJ.D
  3 GUS.G       GRETCHEN.W
  4 MIKEY.B     GUS.G
  ...

Values of A appear in B sometimes but not always. I need to flag the values of A that do not appear in B. The result for the subset shown would look like this:

obs A           B           flg_A_only
  1 TJ.D        KING.B               0
  2 GRETCHEN.W  TJ.D                 0
  3 GUS.G       GRETCHEN.W           0
  4 MIKEY.B     GUS.G                1

As MIKEY.B only appears in A. My current approach is very inefficient:

have_As <- as.data.frame(unique(have$A), stringsAsFactors = F)
colnames(have_As) <- c("A")

have_Bs <- as.data.frame(unique(have$B), stringsAsFactors = F)
colnames(have_Bs) <- c("B")

have_As$flg_A_only <- 99

for(i in 1:nrow(have_As)){
  for(j in 1:nrow(have_Bs)){
    have_As$flg_A_only[i] <- if_else(have_As$A[i] == have_Bs$B[j], 1, 0)
  }
}

...then I merge the have_As and have_Bs data frames. This seems very out of the way, and the for loop takes a long time to run. What's a simpler solution?

J.Q
  • 971
  • 1
  • 14
  • 29
  • 3
    How about just `have$flg_A_only = !(have$A %in% have$B)` No need for a double loop here. – MrFlick May 03 '19 at 16:11
  • @MrFlick this works perfectly and takes no time. Thanks! Happy to accept the answer if you add it below – J.Q May 03 '19 at 16:14
  • 2
    I think it was similar enough to the other question not to bother with adding another answer here. But glad it helped. – MrFlick May 03 '19 at 16:21
  • See this related question: https://stackoverflow.com/q/30098269/5325862 – camille May 03 '19 at 16:24

0 Answers0