2

I have a simple dataframe, in which each row contains various amounts (n) of NA. I want to keep all rows with either n NAs or >=n NAs in a new dataframe.

Right now, i am first summing up all NAs in a row, then splitting the dataframe:

df <- structure(list(`2015` = c(33L, 61L, 31L, 35L, 24L, 38L), `2014` = c(39L, 
NA, NA, 33L, 55L, 34L), `2013` = c(NA, NA, NA, 32L, NA, NA), 
    `2012` = c(NA, NA, NA, 40L, NA, NA), `2011` = c(NA, NA, NA, 
    40L, NA, NA), `2010` = c(NA, NA, NA, 33L, NA, NA), `2009` = c(NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
    )), .Names = c("2015", "2014", "2013", "2012", "2011", "2010", 
"2009"), row.names = c(NA, 6L), class = "data.frame")

df$NAsum <- apply(df, 1, function(x) sum(!is.na(x)))
list2env(split(df, df$NAsum),envir = .GlobalEnv) 

From there, i am rbinding dataframes with the target amount of NAs, but i guess there must be a smarter way to do it.

nouse
  • 3,315
  • 2
  • 29
  • 56

1 Answers1

2
n <- 3  # number of NAs
newDf <- df[rowSums(is.na(df)) >= n, ]
Rorschach
  • 31,301
  • 5
  • 78
  • 129