Filtering a data.table so that every subset is per selected block of data rather than row

Question

If I have the following data.table

matchID characterID info
1111    4           abc
1111    12          def
1111    1           ghi
2222    8           jkl
2222    7           mno
2222    3           pwr
3333    9           abc
3333    2           ghi
33333   4           jkl

I want to subset it to look for specific characterIDs, but return every matchID associated with the characterID. For instance, if I query for characterID = 12 I should get this data set:

matchID    characterID  info
1111       4            abc
1111       12           def
1111       1            ghi

What would that data.table subset look like? I am specifically looking for something in the form of datatable[characterID = 12, 1:3, Info].

`df[df$matchID %in% df$matchID[df$characterID == 12]]` but I think this is not the `data.table` way of doing it. — Ronak Shah, Oct 05 '17 at 06:34

akrun · Accepted Answer · 2017-10-05T06:39:57.213

3

We create a function to get the subset of dataset that match with the 'characterID'

library(dplyr)
f1 <- function(dat, charIDs) {

       dat %>%
           group_by(matchID) %>%
            filter(all(charIDs %in% characterID))
  }

We can either pass as single 'ID' or multiple IDs to filter the rows

f1(df1, 12)
# A tibble: 3 x 3
# Groups:   matchID [1]
#  matchID characterID  info
#    <int>       <int> <chr>
#1    1111           4   abc
#2    1111          12   def
#3    1111           1   ghi

f1(df1, c(7, 3))
# A tibble: 3 x 3
# Groups:   matchID [1]
#   matchID characterID  info
#    <int>       <int> <chr>
#1    2222           8   jkl
#2    2222           7   mno
#3    2222           3   pwr

We can also use data.table option

library(data.table)
setDT(df1)[ , if(all(12 %in% characterID)) .SD,  matchID]

Or

setDT(df1)[ , .SD[all(12 %in% characterID)],  matchID]

Or

setDT(df1)[df1[ , .I[all(12 %in% characterID)],  matchID]$V1]

edited Oct 05 '17 at 06:39

answered Oct 05 '17 at 06:34

akrun

874,273
37
540
662

1

The last three are what I am looking for, thank you. Is there any difference in computational complexity between the three? – Alex Oct 05 '17 at 07:14
@Alex The `.I` would be faster – akrun Oct 05 '17 at 07:35
A selfjoin seems to do the job too merge(Y[characterID ==12,"matchID"],Y, by="matchID") – ashleych Oct 05 '17 at 08:58
@akrun If I wanted to make the query slightly more complex where I required a characterID of 4 & info = ABC and also characterID of 12 & info = DEF, both in the same set of matchID. Could I do this in a single subset? Or would I have to subset the data.table once and then subset again with new parameters? – Alex Oct 05 '17 at 19:19
@Alex Could you please post as a new question as it is not clear from the comments – akrun Oct 06 '17 at 04:57

Filtering a data.table so that every subset is per selected block of data rather than row

1 Answers1