0

I have a dataframe that I would like to delete rows from, based on a value in a specific column. As an example, the dataframe appears something like this:

    a  b  c  d 
1   1  2  3  0
2   4 NA  1 NA
3   6  4  0  1
4  NA  5  0  0

I would like to remove all rows with a value greater than 0 in column d. I have been trying to use the following code to do this:

df <- df[!df$d > 0, ]

but this is appearing to have the effect of deleting all the value is rows with an NA value in column d. I was assuming that a na.rm = TRUEargument was needed but I wasn't sure where to fit it in the function above.

Cheers, Ant

Sotos
  • 51,121
  • 6
  • 32
  • 66
Ant
  • 313
  • 5
  • 19
  • 1
    Try with `df[!(df$d > 0 & !is.na(df$d)), ]` or use `filter` from `dplyr` i.e. `df %>% filter(!d > 0)` – akrun Jan 17 '19 at 14:34
  • @akrun I've tried both methods. In each, the code seems to remove both all the rows with a value >0 in column d and all the rows with an NA in column d. There a way to keep the values where col d = NA (rows 1,2 and 4 in the example above). – Ant Jan 17 '19 at 14:43
  • Okay, I think I thought your condition is different. You need the reverse condition in `is.na` – akrun Jan 17 '19 at 14:46
  • @RonakShah the code looks to have removed all the '>0' rows successfully but also seems to have converted all other values in the dataframe where d = NA to NA – Ant Jan 17 '19 at 14:52
  • @Ant seems to work on the sample data shared without any information changed. You can check my answer below. – Ronak Shah Jan 17 '19 at 14:54

3 Answers3

2

We need to select the rows where d is not greater than 0 OR there is NA in d

df[with(df, !d > 0 | is.na(d)), ]

#   a  b c  d
#1  1  2 3  0
#2  4 NA 1 NA
#4 NA  5 0  0

Or we can also use subset

subset(df, !d > 0 | is.na(d))

or dplyr filter

library(dplyr)
df %>% filter(!d > 0 | is.na(d))

The !d > 0 part can also be reversed to

subset(df, d < 1 | is.na(d))

to get the same result.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

We can construct the logical vector with complete.cases

subset(df, !d > 0 | complete.cases(d))
#   a b c d
#1  1 2 3 0
#3  6 4 0 1
#4 NA 5 0 0

Or use subset with replace

subset(df,  !replace(d, is.na(d), 0) > 0)

Or with tidyverse

library(tidyverse)
df %>%
   filter(!replace_na(d, 0) >0)

which is slightly different from the method mentioned here or here

data

df <- structure(list(a = c(1L, 4L, 6L, NA), b = c(2L, NA, 4L, 5L), 
c = c(3L, 1L, 0L, 0L), d = c(0L, NA, 1L, 0L)), class = "data.frame",
 row.names = c("1", "2", "3", "4"))
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
0

If u add a |all rows that has a NA will match. The condition !df$d > 0 will get executed for those in d that are not a NA. So I think you were looking for:

df[is.na(df$d) | !df$d > 0,  ]

Wheras, the below wont include the rows that has a NA in column d and that does not match the condition !df$d > 0

df[!is.na(df$d) & !df$d > 0,  ]
JineshEP
  • 738
  • 4
  • 7