First of all, you probably don't want to cbind()
first -- that will coerce all of your variables to character.
emp_salary <- data.frame(employee,salary)
Two possible solutions:
subset
automatically excludes cases where the criterion is NA
:
nrow(subset(emp_salary,salary>1e5))
- count the results directly and use
na.rm=TRUE
:
sum(salary>1e5,na.rm=TRUE)
As for the logic behind the bogus rows:
bigsal <- salary>1e5
is a logical vector which contains NA
s, as it must (because there is no way to know whether an NA
value satisfies the criterion or not).
- when indexing the rows of a data frame with a logical vector containing
NA
s, this is probably the most salient bit of document (from help("[")
):
When extracting, a numerical, logical or character ‘NA’ index picks an unknown element and so returns ‘NA’ in the corresponding element of a logical, integer, numeric, complex or character result, and ‘NULL’ for a list.
(I searched
help("[.data.frame")
and couldn't see anything more useful.)
The thing to remember is that once the indexing is being done, R no longer has any knowledge that the logical vector was created from the salary
column, so there's no way for it to do what you might want, which is to retain the values in the other columns. Here's one way to think about the seemingly strange behaviour of filling in all the columns in the NA
row with NA
s: if R leaves the row out entirely, that would correspond to the criterion being FALSE
. If it retains it (and remember that it can't retain just a few columns and drop the others), then that would correspond to the criterion being TRUE
. If the criterion is neither FALSE
nor TRUE
, then it's hard to see what other behaviour makes sense ...