1

I have a table (737:35) which has different values for missing observations, e.g. NA, 0, n/a or --

So, the table looks like this:

┌────┬────┬──────┬──────┬──────┬──────┬──────┐
│ X  │ Y  │ 2010 │ 2011 │ 2012 │ 2013 │ 2014 │
├────┼────┼──────┼──────┼──────┼──────┼──────┤
│ A  │ XY │ 0    │ 0    │ 0    │ 10   │ 15   │
│ A  │ XZ │ 11   │ 13   │ n/a  │ 12   │ 14   │
│ B  │ XY │ 0    │ --   │ 0    │ 7    │ --   │
│ B  │ XZ │ 15   │ 16   │ 12   │ 13   │ 14   │
│ C  │ XY │ NA   │ NA   │ NA   │ NA   │ NA   │
│ C  │ XZ │ 8    │ 12   │ 11   │ n/a  │n/a   │
└────┴────┴──────┴──────┴──────┴──────┴──────┘

I want to delete all rows which have a value of either NA, 0, n/a or --, so I end up having only row 5 in this example

I could get rid of the rows with NA using is.na()and deleted the rows with a 0 in the column 2010 using grep(); could someone tell me how I can delete all rows missing an observation in an easy way?

Lila
  • 59
  • 6
  • 1
    Please provide this in reproducible form so we can copy and paste this into R. Run dput(X) and show the output in the question (where X is your input). Also clarify exactly what is to be regarded as missing. Are both NA and "NA" missing? What about 0 vs. "0"? I have made some assumptions in my answer but you need to be precise in the question. Do not answer in the comments. Update the question so it is complete. – G. Grothendieck May 05 '19 at 17:19
  • [This post](https://stackoverflow.com/q/4862178/5325862) handles dropping rows with `NA` values and has 16 answers already – camille May 05 '19 at 18:03

2 Answers2

4

An option would be to use %in% to check whether values '0', 'n/a', or '--') are present in the column along with is.na using |, Reduce it to a single logical vector with | to check whether any value in the row have these characters, negate (! - meaning none of the elements in the row have that character) and use that o subset the rows

i1 <- !Reduce(`|`, lapply(df1, function(x) is.na(x)| x %in% c(0, "n/a", "--")))
df2 <- df1[i1, , drop = FALSE]
#   X  Y 2010 2011 2012 2013 2014
#4 B XZ   15   16   12   13   14

Or using filter_all

library(dplyr)
df1 %>%
     filter_all(all_vars(!(is.na(.) | . %in% c(0, "n/a", "--"))))
#   X  Y 2010 2011 2012 2013 2014
#1 B XZ   15   16   12   13   14

data

df1 <- structure(list(X = c("A", "A", "B", "B", "C", "C"), Y = c("XY", 
"XZ", "XY", "XZ", "XY", "XZ"), `2010` = c(0L, 11L, 0L, 15L, NA, 
8L), `2011` = c("0", "13", "--", "16", NA, "12"), `2012` = c("0", 
"n/a", "0", "12", NA, "11"), `2013` = c("10", "12", "7", "13", 
NA, "n/a"), `2014` = c("15", "14", "--", "14", NA, "n/a")),
class = "data.frame", row.names = c(NA, 
-6L))
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Use apply to go row by row checking whether any element of that row is in the character vector na.strings. This gives a logical vector of one element per row which where TRUE means the corresponding row has an element found in na.strings. Negate that logical vector and subscript df1 by it.

na.strings <- c(0, NA, "NA", "--", "n/a")
df1[ !apply(df1, 1, function(x) any(x %in% na.strings)), ]

This does not distinguish between 0 and "0" or between NA and "NA" but I suspect you want all those to be regarded as missing. If you did want to distinguish between NA and "NA" and only regard NA but not "NA" as missing then omit "NA" from na.strings.

Note that if you are reading this in from a text file then you could specify na.strings in your read.table command:

na.omit(read.table("myfile", na.strings = c("NA", "0", "n/a", "--"), ...whatever...))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341