-1

I am trying to eliminate the rows in my dataset that have as values of the variable income either 0 or NA. By running the two lines of code below, I found out that there are 1039 observations with the characteristics that I am looking for. In particular, even though I am only asking for income equal to 0, r automatically takes into also the NA values.

length(allregions$income[allregions$emp == 1 & allregions$income == 0])
allregions$income[allregions$emp == 1 & allregions$income == 0]

However, when I try to eliminate those rows, r only deletes the rows with income equal to 0 and keeps those with NA. Even if I add NA in the condition, those values still remain in my dataset.

allregions <- allregions[!(allregions$income == 0 & allregions$emp == 1),]

How can I drop the rows with NAs in a particular column? Also, how is it possible that even though I apply the same condition, in one case R takes into account NAs too and in another it doesn't?

Thank you in advance for your help!

  • 2
    Can you include a [reproducible example](http://stackoverflow.com/questions/5963269) along with the expected output? – Ronak Shah Mar 21 '20 at 12:15
  • 1
    Use ```which(...)``` around your filtering statement. ```which``` returns the indices which are TRUE. You can also look into ```na.omit()``` – Cole Mar 21 '20 at 12:24

2 Answers2

1

You can use %in% like this:

result <- allregions[!(allregions$income %in% c(0, NA)), ]

Or use is.na() to test for NA

result <- allregions[allregions$income != 0 & !is.na(allregions$income), ]

To understand why R behaves like it does, I'd suggest these FAQs: Logical operators (AND, OR) with NA, TRUE and FALSE, Dealing with TRUE, FALSE, NA and NaN

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
1

Tidyverse is a great package for this sort of task:

library(tidyverse)

result <- allregions %>%
  filter(!is.na(income) & income !=0)

result <- allregions %>%
  filter(!income %in% c(0, NA))
Caitlin
  • 505
  • 2
  • 14