1

I want to select only the data containing a continuous number of ID from 1-8 of each DATE. Note, I do not want to select data that do not contain a continuous number in ID from 1-8 (eg. Data on this DATE 1/2/2020). Can you help me to write a code for this in R, the example dataset attached below:

    Date    CO2 CH4 ID
1/1/2020    23  13  1
1/1/2020    22  11  2
1/1/2020    12  12  3
1/1/2020    24  9   4
1/1/2020    26  8   5
1/1/2020    19  9   6
1/1/2020    34  7   7
1/1/2020    15  10  8
1/1/2020    17  11  9
1/1/2020    19  9   10
1/1/2020    23  6   11
1/1/2020    33  11  12
1/2/2020    23  9   1
1/2/2020    34  8   2
1/2/2020    25  9   3
1/2/2020    23  10  4
1/3/2020    25  11  1
1/3/2020    24  12  2
1/3/2020    34  9   3
1/3/2020    33  8   4
1/3/2020    31  10  5
1/3/2020    32  4   6
1/3/2020    24  6   7
1/4/2020    22  9   1
1/4/2020    21  8   2
1/4/2020    23  9   3
1/4/2020    22  10  4
1/4/2020    22  11  5
1/4/2020    21  14  6
1/4/2020    23  12  7
1/4/2020    23  13  8
1/4/2020    21  11  9
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

1 Answers1

2

Select the date where all the ID values from 1-8 are present.

library(dplyr)

df1 <- df %>%
  group_by(Date) %>%
  filter(all(1:8 %in% ID)) %>%
  ungroup

#   Date       CO2   CH4    ID
#   <chr>    <int> <int> <int>
# 1 1/1/2020    23    13     1
# 2 1/1/2020    22    11     2
# 3 1/1/2020    12    12     3
# 4 1/1/2020    24     9     4
# 5 1/1/2020    26     8     5
# 6 1/1/2020    19     9     6
# 7 1/1/2020    34     7     7
# 8 1/1/2020    15    10     8
# 9 1/1/2020    17    11     9
#10 1/1/2020    19     9    10
# … with 11 more rows

Base R -

df1 <- subset(df, as.logical(ave(ID, Date, FUN = function(x) all(1:8 %in% x))))

data.table -

library(data.table)
setDT(df)[, .SD[all(1:8 %in% ID)], Date]

data

df <- structure(list(Date = c("1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020", 
"1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020", 
"1/1/2020", "1/1/2020", "1/2/2020", "1/2/2020", "1/2/2020", "1/2/2020", 
"1/3/2020", "1/3/2020", "1/3/2020", "1/3/2020", "1/3/2020", "1/3/2020", 
"1/3/2020", "1/4/2020", "1/4/2020", "1/4/2020", "1/4/2020", "1/4/2020", 
"1/4/2020", "1/4/2020", "1/4/2020", "1/4/2020"), CO2 = c(23L, 
22L, 12L, 24L, 26L, 19L, 34L, 15L, 17L, 19L, 23L, 33L, 23L, 34L, 
25L, 23L, 25L, 24L, 34L, 33L, 31L, 32L, 24L, 22L, 21L, 23L, 22L, 
22L, 21L, 23L, 23L, 21L), CH4 = c(13L, 11L, 12L, 9L, 8L, 9L, 
7L, 10L, 11L, 9L, 6L, 11L, 9L, 8L, 9L, 10L, 11L, 12L, 9L, 8L, 
10L, 4L, 6L, 9L, 8L, 9L, 10L, 11L, 14L, 12L, 13L, 11L), ID = c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L
)), class = "data.frame", row.names = c(NA, -32L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you for the reply. It worked in example data set but I am having difficulty in real data. In my real data, the ID has read from 7 to 25. – Rajan Dhakal Jun 21 '21 at 10:45
  • and which ones do you want to select? In the above example dataset we are selecting Dates that has ID's from 1 to 8. – Ronak Shah Jun 21 '21 at 10:52