0

My dataset:

> as_tibble(wq4)
# A tibble: 58,538 x 4
   Date       Site  Analyte                    Value2
   <date>     <fct> <fct>                       <dbl>
 1 2014-01-10  N2   Ammonia NH3-N               0.01 
 2 2014-01-10  N2   Chlorophyll - a             1.5  
 3 2014-01-10  N2   Filtered Total Phosphorus   0.005
 4 2014-01-10  N2   Oxidised Nitrogen           0.1  
 5 2014-01-10  N2   Total Nitrogen              0.3  
 6 2014-01-10  N2   Total Phosphorus            0.008
 7 2014-01-10  N2   Ammonia NH3-N               0.02 
 8 2014-01-10  N2   Chlorophyll - a             1.4  
 9 2014-01-10  N2   Conductivity                191    
10 2014-01-10  N2   Enterococci                 19    
# … with 58,528 more rows

I want to filter out a certain set of values based on multiple conditions using dplyr. What I've got so far is:

filter(wq4, Site != "N1" & !Date %in% c("2019-04-17", "2019-04-18", "2019-04-19"))

I essentially want to remove any data from the 17th to 19th Apr 2019, only at Site N1 (not any of my other sites).

I don't think this line of code is working for me. It is either the "&" or perhaps dplyr is struggling with the date format?

Any suggestions? Thanks.

r2evans
  • 141,215
  • 6
  • 77
  • 149
tm95
  • 77
  • 5

2 Answers2

1

Your sample data does not include "N1", but here's a guess:

filter(wq4, Site != "N1" | !between(Date, as.Date("2019-04-17"), as.Date("2019-04-19")))

will return sites that are not "N1" (any date), and data for site "N1" that are not between those two dates.

You can still use your %in% logic if you prefer, I offered !between as an alternative for two reasons:

  1. Date objects are not necessarily integral, diff(c(Sys.Date(), Sys.Date() + 0.1)) returns 0.1 days, showing that it is a floating-point. If your dates are all clearly integral and nothing has possibly nudged them off of the perfect-day, then your %in% should work just fine, but along the lines of Why are these numbers not equal?, floating-point equality is not assured.

    As an example:

    Sys.Date()
    # [1] "2020-09-19"
    Sys.Date() %in% as.Date("2020-09-19")
    # [1] TRUE
    (Sys.Date() + 0.1)
    # [1] "2020-09-19"                             # still looks integral
    (Sys.Date() + 0.1) %in% as.Date("2020-09-19")
    # [1] FALSE
    
  2. In case you want to span more than a few days, it is more efficient to deal with the start/end dates instead of every ... possible ... date.

r2evans
  • 141,215
  • 6
  • 77
  • 149
1

Try with :

library(dplyr)
wq4 %>%
   filter(!(Site == "N1" & 
            Date %in% as.Date(c("2019-04-17", "2019-04-18", "2019-04-19"))))

and the same expression in subset :

subset(wq4, !(Site == "N1" & 
              Date %in% as.Date(c("2019-04-17", "2019-04-18", "2019-04-19"))))

Site == "N1" & Date %in% as.Date(c("2019-04-17", "2019-04-18", "2019-04-19") are the rows which you want to remove. So we add ! sign before it.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213