0

I have a data set that includes a name, date and earliest_date, in which some name will have a earliest_date. Now I want to remove all the data after the earliest_date based on name. And ignore those that have NA in earliest_date. And sicne different name will have different earliest_date, I am pretty sure I can't use filter() with a set date. Any help will be much appericated.

Part of the data is below:

dput(mydata[1:10,])
structure(list(name = c("a", "b", "c", 
"d", "e", "f", "g", 
"a", "h", "i"), Date = structure(c(13214, 
17634, 15290, 18046, 16326, 18068, 10234, 12647, 15485, 15182
), class = "Date"), earliest_date = structure(c(12647, NA, NA, 
NA, NA, NA, NA, 12647, NA, 15552), class = "Date")), row.names = c(NA, 
10L), class = "data.frame")


Desired output: The first row will be removed as the Date recorded after earliest_date

dput(mydata[2:10,])
structure(list(name = c("b", "c", 
"d", "e", "f", "g", 
"a", "h", "i"), Date = structure(c(17634, 15290, 
18046, 16326, 18068, 10234, 12647, 15485, 15182), class = "Date"), 
    earliest_date = structure(c(NA, NA, NA, NA, NA, NA, 12647, 
    NA, 15552), class = "Date")), row.names = 2:10, class = "data.frame")

OMG C
  • 97
  • 5
  • 1
    `mydata %>% filter(Date < earliest_date | is.na(earliest_date))` *might* give you what you want, but without your desired output (and, perhaps, more complex input data), it's impossible to be sure. PS: It's bad practice to name a variable with the name of an R function, class, or other object. – Limey Sep 29 '21 at 06:45
  • I have add a desire output for the reference, now I have think all I need is to remove all the rows that `Date` after `earliest_date` based on `name`? But I haven't figure out the code yet – OMG C Sep 29 '21 at 06:47

2 Answers2

1

This may helps

mydata %>%
  filter(is.na(earliest_date) | Date<=earliest_date)

  name       Date earliest_date
1    b 2018-04-13          <NA>
2    c 2011-11-12          <NA>
3    d 2019-05-30          <NA>
4    e 2014-09-13          <NA>
5    f 2019-06-21          <NA>
6    g 1998-01-08          <NA>
7    a 2004-08-17    2004-08-17
8    h 2012-05-25          <NA>
9    i 2011-07-27    2012-07-31
Park
  • 14,771
  • 6
  • 10
  • 29
0

Or try:

library(data.table)
setDT(mydata)[is.na(mydata$earliest_date) | mydata$Date<=earliest_date,]
user438383
  • 5,716
  • 8
  • 28
  • 43
U13-Forward
  • 69,221
  • 14
  • 89
  • 114