0
a1 <- data.table(id=c(1,1,1,1,2,2,2,3,3),
                 var=c("6402","1","6302","3","6406","6406","2","1","1"))
b1 <- data.table(var=c("6402","6406","6302"),
                 txt=c("A","B","A"))
mm <- b1[a1,on=.(var)]
dcast(mm,id~txt,function(x) any(!is.na(x)),fill=NA)

desired_output <- data.table(id=c(1,2,3),
                 A=c(T,F,F),
                 B=c(F,T,F))

How can I get the desired_output? Somehow the aggregating function seems to be playing games with me...

Misha
  • 3,114
  • 8
  • 39
  • 60

1 Answers1

2

Make id factor variable and use dcast with drop = FALSE after dropping NA rows.

library(data.table)
mm$id <- factor(mm$id)
dcast(na.omit(mm),id~txt,function(x) any(!is.na(x)), drop = FALSE)

#   id     A     B
#1:  1  TRUE FALSE
#2:  2 FALSE  TRUE
#3:  3 FALSE FALSE
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Any idea why id must be a factor? – Misha Sep 09 '20 at 13:00
  • 1
    So that we don't miss any id which has all `NA`'s. Check output of `na.omit(mm)` there is no `id = 3` in that. Since we have `id` as a factor it appears in the final output. – Ronak Shah Sep 09 '20 at 13:04