2

I couldn't find any answer to this, but I think it's easy to do.

I have this data.table:

DT = expand.grid(Season = c("Winter","Spring","Summer","Fall"),
                 Station = c("A","B","C"),
                 Group = c("1","2","3","4"))
DT$Value = seq(1,length(DT[,1]),1)
DT = data.table(DT)

I want to obtain a subset of DT according to this other data.table:

indexTable = data.table(Season = c("Winter","Spring","Spring"),
                        Station = c("B","B","A"),
                        Group = c("1","2","3"))

Basically I want only the rows of DT that are contained in indexTable. The expected result is this table:

expectedTable = data.table(Season = c("Winter","Spring","Spring"),
                           Station = c("B","B","A"),
                           Group = c("1","2","3"),
                           Value = c(5,18,26))

I am trying to obtain that with this code:

tryTable = DT[DT$Station %in% indexTable$Station &
              DT$Season %in% indexTable$Season &
              DT$Group %in% indexTable$Group,]

which gives me not only the 3 rows I want, but also other rows of DT.

What am I doing wrong? Is there an easy way to obtain expectedTable using data.table indexing notation (for instance using setkey?)

mt1022
  • 16,834
  • 5
  • 48
  • 71

1 Answers1

4

You're after an INNER JOIN of the two tables.

DT[
    indexTable
    , on = c("Season", "Station", "Group")
    , nomatch = 0
]

   Season Station Group Value
1: Winter       B     1     5
2: Spring       B     2    18
3: Spring       A     3    26

Reference

SymbolixAU
  • 25,502
  • 4
  • 67
  • 139