As @MrFlick suggests above, NA
values are handled in slightly (subtly?) different ways depending on how you index.
Test data:
dd <- data.frame(cluster=c("oklahoma","texas",NA))
- logical indexing: a
TRUE
value in the index vector selects the corresponding value, FALSE
drops it, and NA
results in NA
.
dd$cluster=="oklahoma"
## [1] TRUE FALSE NA
summary(dd[dd$cluster=="oklahoma",])
## oklahoma texas NA's
## 1 0 1
In principle you could use dd$cluster=="oklahoma" & !is.na(dd$cluster)
as your criterion - since FALSE & NA
is FALSE
- but that's rather awkward. (Since we have specified a single-column data frame, without saying drop=FALSE
, the result gets simplified to a vector before being summarized.)
- subset: although it is sometimes deprecated for non-interactive use,
subset
has the convenient property that it drops values where the criterion evaluates to NA
. (Also, subset
always returns a data frame even if the result is only one column wide.)
summary(subset(dd,cluster=="oklahoma"))
## cluster
## oklahoma:1
## texas :0
- which:
which()
only returns indices for TRUE
values, not for NA
values:
which(dd$cluster=="oklahoma")
## [1] 1
summary(dd[which(dd$cluster=="oklahoma"),])
## oklahoma texas
## 1 0