1

I have a data set, similar to the example:

Data <- data.table(Date=c("2020-01-02", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-04"), ID=c(1,2,1,1,1,1,2), Value=c(150,120,111,189,654,333,213))

Output:

         Date ID Value
1: 2020-01-02  1   150
2: 2020-01-02  2   120
3: 2020-01-03  1   111
4: 2020-01-04  1   189
5: 2020-01-04  1   654
6: 2020-01-04  1   333
7: 2020-01-04  2   213

I would like to filter or subset Data by Date and ID. For example, If in a day, I have less than 2 observation of ID, then remove the ID and Value in that day. 2020-01-02 I have only one ID 1, so, this row will be removed.

I would appreciate if you could give me some advice.

Many thanks

Mee
  • 85
  • 7

2 Answers2

1

Just group by Date and ID, count observations and filter when there are greater than one:

Data[, n:=.N, by = .(Date, ID)][n>1]
#          Date ID Value n
# 1: 2020-01-04  1   189 3
# 2: 2020-01-04  1   654 3
# 3: 2020-01-04  1   333 3
SamR
  • 8,826
  • 3
  • 11
  • 33
1

Using dplyr, you can try:

library(dplyr)
Data %>% group_by(ID, Date) %>% filter(n() >= 2) %>% ungroup()

Output:

# Date          ID Value
# <chr>      <dbl> <dbl>
# 1 2020-01-04     1   189
# 2 2020-01-04     1   654
# 3 2020-01-04     1   333
jpsmith
  • 11,023
  • 5
  • 15
  • 36