How to filter/subset a data table based on condition on other columns in R?

Question

I have a data set, similar to the example:

Data <- data.table(Date=c("2020-01-02", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-04"), ID=c(1,2,1,1,1,1,2), Value=c(150,120,111,189,654,333,213))

Output:

         Date ID Value
1: 2020-01-02  1   150
2: 2020-01-02  2   120
3: 2020-01-03  1   111
4: 2020-01-04  1   189
5: 2020-01-04  1   654
6: 2020-01-04  1   333
7: 2020-01-04  2   213

I would like to filter or subset Data by Date and ID. For example, If in a day, I have less than 2 observation of ID, then remove the ID and Value in that day. 2020-01-02 I have only one ID 1, so, this row will be removed.

I would appreciate if you could give me some advice.

Many thanks

SamR · Accepted Answer · 2022-04-27T12:08:12.710

1

Just group by Date and ID, count observations and filter when there are greater than one:

Data[, n:=.N, by = .(Date, ID)][n>1]
#          Date ID Value n
# 1: 2020-01-04  1   189 3
# 2: 2020-01-04  1   654 3
# 3: 2020-01-04  1   333 3

edited Apr 27 '22 at 12:08

answered Apr 27 '22 at 11:46

SamR

8,826
3
11
33

score 1 · Answer 2 · answered Apr 27 '22 at 11:47

1

Using dplyr, you can try:

library(dplyr)
Data %>% group_by(ID, Date) %>% filter(n() >= 2) %>% ungroup()

Output:

# Date          ID Value
# <chr>      <dbl> <dbl>
# 1 2020-01-04     1   189
# 2 2020-01-04     1   654
# 3 2020-01-04     1   333

answered Apr 27 '22 at 11:47

jpsmith

11,023
5
15
36

How to filter/subset a data table based on condition on other columns in R?

2 Answers2