2

I have the following in an R data.table:

  id | status
=============
   1 |      A
   1 |      B
   2 |      A
   2 |      B
   3 |      A
   4 |      A
   5 |      B

I want to show only the rows grouped by id that have an A, but not a B. So a result would be something like this:

  id | status
=============
   3 |      A
   4 |      A

So far I have this:

dt[, sum(status == "A") > 0 && sum(status == "B") == 0, by = id]

which gets me:

  id | status
=============
   1 |  FALSE
   2 |  FALSE
   3 |   TRUE
   4 |   TRUE
   5 |  FALSE

which I think is on the right track, but I don't know how to just get the rows that I want. Am I on the right track, or am I completely thinking about it the wrong way?

user3685285
  • 6,066
  • 13
  • 54
  • 95

1 Answers1

3

In this (I assume simplified) example you can filter the rows that have a B, then anti-join those back onto the original data

dt[ !dt[ status == "B" ], on = "id" ]

#    id status
# 1:  3      A
# 2:  4      A

This works because

dt[ status == "B"]

gives the 'id's that have a B. These are the ones you want to exclude from your result set, which can be achieved through the anti-join (the dt[ !dt[ ] on = "" ] notation)


If your data is more complex, you can add some extra filters into this logic to first filter the values you want, e.g.

dt[status == "A"][ !dt[ status == "B" ], on = "id" ]

will ensure your returned result will include the IDs that have an A and not a B

SymbolixAU
  • 25,502
  • 4
  • 67
  • 139
  • For the last point, to get ids with an A, might need a self semi join https://stackoverflow.com/questions/18969420/perform-a-semi-join-with-data-table like `dt[dt[status=="A", .(id)], on="id"]` or something. – Frank Dec 12 '17 at 17:00
  • @Frank that won't exclude those `A` ids that also have a `B` value ? – SymbolixAU Dec 12 '17 at 21:17
  • Yes, I meant just the first part of the `dt[...][...]` chain. So with `DT = data.table(id = 1, status = LETTERS[c(1,3)])`, I'd want to see all rows for the `id`, not just the first one, so `DT[.(DT[status == "A", unique(id)]), on="id"][!.(DT[status == "B", unique(id)]), on="id"]` not `DT[status == "A"][!.(DT[status == "B", unique(id)]), on="id"]` – Frank Dec 12 '17 at 21:28