I feel like I'm missing something obvious here, but I just can't see what's going wrong...
All I'm doing is simply making a dataframe (all.score3
) of a larger dataframe (all
). There are no all NA rows in the larger dataframe.
> class(all)
[1] "data.frame"
> table(all$Scoring, useNA = "always")
1 2 3 <NA>
774 772 768 0
> table(all$Resolution_Desc, useNA = "always")
No Response Resolved <NA>
293 962 1059
> class(all$Resolution_Desc)
[1] "character"
> class(all$Scoring)
[1] "numeric"
> all.score3 <- all[all$Scoring == 3 & all$Resolution_Desc == "Resolved", ]
> dim(all.score3)
[1] 677 11
> tail(all.score3)
ID1 ID2 Decile Scoring GroupNo Treat Result1 Result2 flag50 Resolution_Desc nout_2way
NA.362 <NA> NA NA NA <NA> <NA> NA NA NA <NA> NA
NA.363 <NA> NA NA NA <NA> <NA> NA NA NA <NA> NA
NA.364 <NA> NA NA NA <NA> <NA> NA NA NA <NA> NA
NA.365 <NA> NA NA NA <NA> <NA> NA NA NA <NA> NA
NA.366 <NA> NA NA NA <NA> <NA> NA NA NA <NA> NA
NA.367 <NA> NA NA NA <NA> <NA> NA NA NA <NA> NA
> cat("What ????")
What ????
It must be something to do with the all$Resolution_Desc == "Resolved"
filter, because that also produces rows of NA
if I only use that filter, but this is not true with the other:
all.score3 <- all[all$Resolution_Desc == "Resolved", ]
Why is this operation producing rows of NA
that aren't present in the larger dataframe and which should not be present in the resulting dataframe anyway based on the conditions in the row filter?
Note -- I can work around this, such as with sqldf
(or probably with subset
), but I'd still like to understand what's happening here. I checked to make sure I'm using &
the correct way, as opposed to using &&
or something, and the resources I found seem to indicate that this should be correct...