I have the following data.table:
dt = data.table(year=c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2),
quar=c(1, 1, 1, 2, 2, 3, 4, 4, 4, 1, 1, 1),
item=c(1, 2, 3, 1, 2, 1, 1, 2, 3, 1, 2, 3))
Some periods (defined by year and quarter) have three items:
- Y1Q1, Y1Q4, Y2Q1
The other periods don't:
- Y1Q2 has 2 items
- Y1Q3 has 1 item
How can I only get the rows which have all three items?
What I've managed so far is
dt[dt[, "i" := nrow(.SD) == 3, .(year, quar)]$i]
> year | quar | item | i
> -----+------+------+-----
> 1 | 1 | 1 | TRUE
> 1 | 1 | 2 | TRUE
> 1 | 1 | 3 | TRUE
> 1 | 4 | 1 | TRUE
> 1 | 4 | 2 | TRUE
> 1 | 4 | 3 | TRUE
> 2 | 1 | 1 | TRUE
> 2 | 1 | 2 | TRUE
> 2 | 1 | 3 | TRUE
Which groups by year
and quar
and then sets a column i
which states whether that group is valid. All rows in the group get the calculated value of i
.
Which works just fine. However, it has the side-effect of adding a real i
column to the table.
I tried using a temporary column declared with .(i =...)
, but then the i
column has the length of the shorter, grouped table and we get
dt[dt[, .(i = nrow(.SD) == 3), .(year, quar)]$i]
> Error in `[.data.table`(dt, dt[, .(i = nrow(.SD) == 3), .(year, quar)]$i) :
> i evaluates to a logical vector length 5 but there are 12 rows. [...]
So, is there a more elegant way of solving this? Or should I just use this and then drop i
?