0

I have a large complex dataset I need to pare down carefully. In some cases that means filtering a single record based on unique criteria. Suppose I have the following data:

       locname mo dy   yr nest.stat daynight
1 CARACO CREEK  3  9 1994         U        D
2 CARACO CREEK  4  4 1994      <NA>        D
3 CARACO CREEK  4 14 1994      <NA>        N
4 CARACO CREEK  5  5 1994      <NA>        D
5 CARACO CREEK  5 17 1994      <NA>        N
6 CARACO CREEK  6 29 1994      <NA>        N
7 CARACO CREEK  8  2 1994         F        D

I need to remove the seventh record, which is unique in the dataset by locname_yr_nest.stat (I can't just say df[-7,] because the position can change in new data iterations).

I tried

df[!(df$locname=="CARACO CREEK" & df$nest.stat=="F" & df$yr==1994),]

but that returns

          locname mo dy   yr nest.stat daynight
1    CARACO CREEK  3  9 1994         U        D
NA           <NA> NA NA   NA      <NA>     <NA>
NA.1         <NA> NA NA   NA      <NA>     <NA>
NA.2         <NA> NA NA   NA      <NA>     <NA>
NA.3         <NA> NA NA   NA      <NA>     <NA>
NA.4         <NA> NA NA   NA      <NA>     <NA>

If I only filter on two columns (e.g. locname and yr) it works fine. That's how I created this smaller set from the larger, showing all 1994 records. Adding the third column throws it off though. As an additional note, this exact approach worked in a different dataset on different columns.

Here is the sample set for simplicity:

df <- structure(list(locname = c("CARACO CREEK", "CARACO CREEK", "CARACO CREEK", 
"CARACO CREEK", "CARACO CREEK", "CARACO CREEK", "CARACO CREEK"
), mo = c(3, 4, 4, 5, 5, 6, 8), dy = c(9, 4, 14, 5, 17, 29, 2
), yr = c(1994, 1994, 1994, 1994, 1994, 1994, 1994), nest.stat = c("U", 
NA, NA, NA, NA, NA, "F"), daynight = c("D", "D", "N", "D", "N", 
"N", "D")), class = "data.frame", row.names = c(NA, 7L))
A.Birdman
  • 161
  • 1
  • 2
  • 12
  • Possible duplicate of [Subsetting R data frame results in mysterious NA rows](https://stackoverflow.com/questions/14261619/subsetting-r-data-frame-results-in-mysterious-na-rows) – Khaynes Jan 11 '19 at 22:51
  • It does drop the nest.stat==F record, but also seems to drop the other records shown as NA in my output above, leaving only the intact nest.stat==U record. – A.Birdman Jan 11 '19 at 23:00

1 Answers1

0

Your conditional check nest.stat fails when comparing "F" with NA's.

Here's a messy, base-R way of doing this:

df[!(df$locname == "CARACO CREEK" & 
     ifelse(!is.na(df$nest.stat),df$nest.stat == "F",FALSE) &
      df$yr == 1994),]

Output:

   locname mo dy   yr nest.stat daynight
1 CARACO CREEK  3  9 1994         U        D
2 CARACO CREEK  4  4 1994      <NA>        D
3 CARACO CREEK  4 14 1994      <NA>        N
4 CARACO CREEK  5  5 1994      <NA>        D
5 CARACO CREEK  5 17 1994      <NA>        N
6 CARACO CREEK  6 29 1994      <NA>        N
shwan
  • 538
  • 6
  • 21
  • Thanks! I don't quite understand the logic in the ifelse call, but it works and I'll figure it out later! – A.Birdman Jan 11 '19 at 23:04
  • The syntax for ifelse is: ifelse(conditional statement,TRUE result,FALSE result). So in this case, if nest.stat is not NA, then check if `nest.stat == "F"`, otherwise just return FALSE – shwan Jan 11 '19 at 23:06