100

I have a dataset with empty rows. I would like to remove them:

myData<-myData[-which(apply(myData,1,function(x)all(is.na(x)))),]

It works OK. But now I would like to add a column in my data and initialize the first value:

myData$newCol[1] <- -999

Error in `$<-.data.frame`(`*tmp*`, "newCol", value = -999) : 
  replacement has 1 rows, data has 0

Unfortunately it doesn't work and I don't really understand why and I can't solve this. It worked when I removed one line at a time using:

TgData = TgData[2:nrow(TgData),]

Or anything similar.

It also works when I used only the first 13.000 rows.

But it doesn't work with my actual data, with 32.000 rows.

What did I do wrong? It seems to make no sense to me.

Antonin
  • 1,748
  • 7
  • 19
  • 24
  • possible duplicate of [remove rows with NAs in data.frame](http://stackoverflow.com/questions/4862178/remove-rows-with-nas-in-data-frame) – Waldir Leoncio Dec 01 '14 at 12:55

7 Answers7

130

I assume you want to remove rows that are all NAs. Then, you can do the following :

data <- rbind(c(1,2,3), c(1, NA, 4), c(4,6,7), c(NA, NA, NA), c(4, 8, NA)) # sample data
data
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1   NA    4
[3,]    4    6    7
[4,]   NA   NA   NA
[5,]    4    8   NA

data[rowSums(is.na(data)) != ncol(data),]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1   NA    4
[3,]    4    6    7
[4,]    4    8   NA

If you want to remove rows that have at least one NA, just change the condition :

data[rowSums(is.na(data)) == 0,]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    6    7
Sam Firke
  • 21,571
  • 9
  • 87
  • 105
Wookai
  • 20,883
  • 16
  • 73
  • 86
  • 36
    The second case can also be handled via: `data[complete.cases(data),]`. – Joshua Ulrich Jun 22 '11 at 14:11
  • @JoshuaUlrich Thx for your helping answer! Just for the understanding? Why do you let a `,` in the end of `data[complete.cases(data),]` your code? – Carol.Kar Aug 13 '14 at 15:05
  • 1
    @mrquad, that means you're subsetting by rows; see http://stackoverflow.com/a/17052459/2152245. – Matt Sep 28 '15 at 14:31
  • This solution works great, but I didn't understand it at first ... for anyone who's struggling like I did, what this code is doing is simply counting the fields in each row that have NA in them, and then comparing that count to the total count of columns in the dataframe. If the two are equal, that's the row you want deleted. – DanM Mar 13 '23 at 13:39
74

If you have empty rows, not NAs, you can do:

data[!apply(data == "", 1, all),]

To remove both (NAs and empty):

data <- data[!apply(is.na(data) | data == "", 1, all),]
Aleksandr Ianevski
  • 1,894
  • 1
  • 18
  • 22
25

Here are some dplyr options:

# sample data
df <- data.frame(a = c('1', NA, '3', NA), b = c('a', 'b', 'c', NA), c = c('e', 'f', 'g', NA))

library(dplyr)

# remove rows where all values are NA:
df %>% filter_all(any_vars(!is.na(.)))
df %>% filter_all(any_vars(complete.cases(.)))  


# remove rows where only some values are NA:
df %>% filter_all(all_vars(!is.na(.)))
df %>% filter_all(all_vars(complete.cases(.)))  

# or more succinctly:
df %>% filter(complete.cases(.))  
df %>% na.omit

# dplyr and tidyr:
library(tidyr)
df %>% drop_na
sbha
  • 9,802
  • 2
  • 74
  • 62
19

Alternative solution for rows of NAs using janitor package

myData %>% remove_empty("rows")
radek
  • 7,240
  • 8
  • 58
  • 83
  • 2
    This was the simplest solution and it worked for me -- thank you! –  Sep 09 '19 at 05:34
3

Using dplyr's if_all/if_any

Drop rows with any NA OR Select rows with no NA value.

df %>% filter(!if_any(a:c, is.na))

#  a b c
#1 1 a e
#2 3 c g

#Also
df %>% filter(if_all(a:c, Negate(is.na)))

Drop rows with all NA values or select rows with at least one non-NA value.

df %>% filter(!if_all(a:c, is.na))

#     a b c
#1    1 a e
#2 <NA> b f
#3    3 c g

#Also
df %>% filter(if_any(a:c, Negate(is.na)))

data

Using data from @sbha -

df <- data.frame(a = c('1', NA, '3', NA), 
                 b = c('a', 'b', 'c', NA), 
                 c = c('e', 'f', 'g', NA))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

This is similar to some of the above answers, but with this, you can specify if you want to remove rows with a percentage of missing values greater-than or equal-to a given percent (with the argument pct)

drop_rows_all_na <- function(x, pct=1) x[!rowSums(is.na(x)) >= ncol(x)*pct,]

Where x is a dataframe and pct is the threshold of NA-filled data you want to get rid of.

pct = 1 means remove rows that have 100% of its values NA. pct = .5 means remome rows that have at least half its values NA

0

Here's yet another answer if you just want a handy function wrapper. Also, many of the above solutions remove a row with ANY NAs, whereas this one only removes rows that are ALL NAs.

data <- rbind(c(1,2,3), c(1, NA, 4), c(4,6,7), c(NA, NA, NA), c(4, 8, NA)) # sample data
data

rmNArows<-function(d){
goodRows<-apply(d,1,function(x) sum(is.na(x))!=ncol(d))
d[goodRows,]
}

rmNArows(data)
mattador
  • 421
  • 4
  • 12