1

I feel like I'm missing something obvious here, but I just can't see what's going wrong...

All I'm doing is simply making a dataframe (all.score3) of a larger dataframe (all). There are no all NA rows in the larger dataframe.

> class(all)
[1] "data.frame"
> table(all$Scoring, useNA = "always")

   1    2    3 <NA> 
 774  772  768    0 
> table(all$Resolution_Desc, useNA = "always")

No Response    Resolved        <NA> 
        293         962        1059 
> class(all$Resolution_Desc)
[1] "character"
> class(all$Scoring)
[1] "numeric"
> all.score3 <- all[all$Scoring == 3 & all$Resolution_Desc == "Resolved", ]
> dim(all.score3)
[1] 677  11
> tail(all.score3)
             ID1      ID2   Decile Scoring GroupNo  Treat Result1 Result2 flag50 Resolution_Desc nout_2way
NA.362      <NA>      NA     NA      NA   <NA>  <NA>       NA       NA     NA            <NA>        NA
NA.363      <NA>      NA     NA      NA   <NA>  <NA>       NA       NA     NA            <NA>        NA
NA.364      <NA>      NA     NA      NA   <NA>  <NA>       NA       NA     NA            <NA>        NA
NA.365      <NA>      NA     NA      NA   <NA>  <NA>       NA       NA     NA            <NA>        NA
NA.366      <NA>      NA     NA      NA   <NA>  <NA>       NA       NA     NA            <NA>        NA
NA.367      <NA>      NA     NA      NA   <NA>  <NA>       NA       NA     NA            <NA>        NA
> cat("What ????")
What ????

It must be something to do with the all$Resolution_Desc == "Resolved" filter, because that also produces rows of NA if I only use that filter, but this is not true with the other:

all.score3 <- all[all$Resolution_Desc == "Resolved", ]

Why is this operation producing rows of NA that aren't present in the larger dataframe and which should not be present in the resulting dataframe anyway based on the conditions in the row filter?

Note -- I can work around this, such as with sqldf (or probably with subset), but I'd still like to understand what's happening here. I checked to make sure I'm using & the correct way, as opposed to using && or something, and the resources I found seem to indicate that this should be correct...

Hack-R
  • 22,422
  • 14
  • 75
  • 131
  • Please clarify what the actual question is – talat Mar 25 '15 at 15:04
  • Why is this operation producing rows of `NA` that aren't present in the larger dataframe and which should not be present in the resulting dataframe anyway based on the conditions in the row filter? – Hack-R Mar 25 '15 at 15:06
  • Thanks for clarification. How about posting a dput of "all"? Does the output remain the same if you do `all[all$Scoring == 3 & all$Resolution_Desc == "Resolved" & !is.na(all$Resolution_Desc), ]`? – talat Mar 25 '15 at 15:09
  • 4
    This is kind of how R behaves when it is being subsetted by boolean expression that returns `NA`, it just keeps it there. Try `df <- data.frame(A = c("a", NA, "a")) ; df[df$A == "a", , drop = FALSE]`. Basically `df$A == "a"` returns `[1] TRUE NA TRUE` and R doesn't know what to do with the `NA` part so it keeps it. – David Arenburg Mar 25 '15 at 15:10
  • @DavidArenburg Oooooo, right! I think you've nailed it. Can you please make that into an answer so that I can mark this as solved? – Hack-R Mar 25 '15 at 15:16
  • @docendodiscimus Sure. I would follow up on your request, but I think that David's comment just answered my question. – Hack-R Mar 25 '15 at 15:17
  • I think @JoshO'Brien once wrote an excellent answer on this but I can't seem to find it. – David Arenburg Mar 25 '15 at 15:18

0 Answers0