3

If I have the following data.table

matchID characterID info
1111    4           abc
1111    12          def
1111    1           ghi
2222    8           jkl
2222    7           mno
2222    3           pwr
3333    9           abc
3333    2           ghi
33333   4           jkl

I want to subset it to look for specific characterIDs, but return every matchID associated with the characterID. For instance, if I query for characterID = 12 I should get this data set:

matchID    characterID  info
1111       4            abc
1111       12           def
1111       1            ghi

What would that data.table subset look like? I am specifically looking for something in the form of datatable[characterID = 12, 1:3, Info].

Alex
  • 83
  • 1
  • 5
  • 1
    `df[df$matchID %in% df$matchID[df$characterID == 12]]` but I think this is not the `data.table` way of doing it. – Ronak Shah Oct 05 '17 at 06:34

1 Answers1

3

We create a function to get the subset of dataset that match with the 'characterID'

library(dplyr)
f1 <- function(dat, charIDs) {

       dat %>%
           group_by(matchID) %>%
            filter(all(charIDs %in% characterID))
  }

We can either pass as single 'ID' or multiple IDs to filter the rows

f1(df1, 12)
# A tibble: 3 x 3
# Groups:   matchID [1]
#  matchID characterID  info
#    <int>       <int> <chr>
#1    1111           4   abc
#2    1111          12   def
#3    1111           1   ghi

f1(df1, c(7, 3))
# A tibble: 3 x 3
# Groups:   matchID [1]
#   matchID characterID  info
#    <int>       <int> <chr>
#1    2222           8   jkl
#2    2222           7   mno
#3    2222           3   pwr

We can also use data.table option

library(data.table)
setDT(df1)[ , if(all(12 %in% characterID)) .SD,  matchID]

Or

setDT(df1)[ , .SD[all(12 %in% characterID)],  matchID]

Or

setDT(df1)[df1[ , .I[all(12 %in% characterID)],  matchID]$V1]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    The last three are what I am looking for, thank you. Is there any difference in computational complexity between the three? – Alex Oct 05 '17 at 07:14
  • @Alex The `.I` would be faster – akrun Oct 05 '17 at 07:35
  • A selfjoin seems to do the job too merge(Y[characterID ==12,"matchID"],Y, by="matchID") – ashleych Oct 05 '17 at 08:58
  • @akrun If I wanted to make the query slightly more complex where I required a characterID of 4 & info = ABC and also characterID of 12 & info = DEF, both in the same set of matchID. Could I do this in a single subset? Or would I have to subset the data.table once and then subset again with new parameters? – Alex Oct 05 '17 at 19:19
  • @Alex Could you please post as a new question as it is not clear from the comments – akrun Oct 06 '17 at 04:57