Removing empty rows of a data file in R

Question

I have a dataset with empty rows. I would like to remove them:

myData<-myData[-which(apply(myData,1,function(x)all(is.na(x)))),]

It works OK. But now I would like to add a column in my data and initialize the first value:

myData$newCol[1] <- -999

Error in `$<-.data.frame`(`*tmp*`, "newCol", value = -999) : 
  replacement has 1 rows, data has 0

Unfortunately it doesn't work and I don't really understand why and I can't solve this. It worked when I removed one line at a time using:

TgData = TgData[2:nrow(TgData),]

Or anything similar.

It also works when I used only the first 13.000 rows.

But it doesn't work with my actual data, with 32.000 rows.

What did I do wrong? It seems to make no sense to me.

possible duplicate of [remove rows with NAs in data.frame](http://stackoverflow.com/questions/4862178/remove-rows-with-nas-in-data-frame) — Waldir Leoncio, Dec 01 '14 at 12:55

score 130 · Accepted Answer · edited Jan 27 '17 at 13:58

130

I assume you want to remove rows that are all NAs. Then, you can do the following :

data <- rbind(c(1,2,3), c(1, NA, 4), c(4,6,7), c(NA, NA, NA), c(4, 8, NA)) # sample data
data
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1   NA    4
[3,]    4    6    7
[4,]   NA   NA   NA
[5,]    4    8   NA

data[rowSums(is.na(data)) != ncol(data),]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1   NA    4
[3,]    4    6    7
[4,]    4    8   NA

If you want to remove rows that have at least one NA, just change the condition :

data[rowSums(is.na(data)) == 0,]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    6    7

edited Jan 27 '17 at 13:58

Sam Firke

21,571
9
87
105

answered Jun 22 '11 at 09:33

Wookai

20,883
16
73
86

36

The second case can also be handled via: `data[complete.cases(data),]`. – Joshua Ulrich Jun 22 '11 at 14:11
@JoshuaUlrich Thx for your helping answer! Just for the understanding? Why do you let a `,` in the end of `data[complete.cases(data),]` your code? – Carol.Kar Aug 13 '14 at 15:05
1

@mrquad, that means you're subsetting by rows; see http://stackoverflow.com/a/17052459/2152245. – Matt Sep 28 '15 at 14:31
This solution works great, but I didn't understand it at first ... for anyone who's struggling like I did, what this code is doing is simply counting the fields in each row that have NA in them, and then comparing that count to the total count of columns in the dataframe. If the two are equal, that's the row you want deleted. – DanM Mar 13 '23 at 13:39

score 74 · Answer 2 · answered May 19 '16 at 11:58

74

If you have empty rows, not NAs, you can do:

data[!apply(data == "", 1, all),]

To remove both (NAs and empty):

data <- data[!apply(is.na(data) | data == "", 1, all),]

answered May 19 '16 at 11:58

Aleksandr Ianevski

1,894
1
18
22

1

This is so elegant. It worked for me when I needed the most. Thanks! – Alejandro Carrera Oct 17 '22 at 16:57

score 25 · Answer 3 · answered Mar 23 '19 at 01:44

Here are some dplyr options:

# sample data
df <- data.frame(a = c('1', NA, '3', NA), b = c('a', 'b', 'c', NA), c = c('e', 'f', 'g', NA))

library(dplyr)

# remove rows where all values are NA:
df %>% filter_all(any_vars(!is.na(.)))
df %>% filter_all(any_vars(complete.cases(.)))  


# remove rows where only some values are NA:
df %>% filter_all(all_vars(!is.na(.)))
df %>% filter_all(all_vars(complete.cases(.)))  

# or more succinctly:
df %>% filter(complete.cases(.))  
df %>% na.omit

# dplyr and tidyr:
library(tidyr)
df %>% drop_na

Neither na.omit() nor drop_na() return non-NA rows. – Sep 09 '19 at 05:38 — , Sep 09 '19 at 05:38

score 19 · Answer 4 · answered Jan 11 '19 at 01:32

19

Alternative solution for rows of NAs using janitor package

myData %>% remove_empty("rows")

answered Jan 11 '19 at 01:32

radek

7,240
8
58
83

2

This was the simplest solution and it worked for me -- thank you! – Sep 09 '19 at 05:34

score 3 · Answer 5 · answered Oct 18 '21 at 03:46

Using dplyr's if_all/if_any

Drop rows with any NA OR Select rows with no NA value.

df %>% filter(!if_any(a:c, is.na))

#  a b c
#1 1 a e
#2 3 c g

#Also
df %>% filter(if_all(a:c, Negate(is.na)))

Drop rows with all NA values or select rows with at least one non-NA value.

df %>% filter(!if_all(a:c, is.na))

#     a b c
#1    1 a e
#2 <NA> b f
#3    3 c g

#Also
df %>% filter(if_any(a:c, Negate(is.na)))

data

Using data from @sbha -

df <- data.frame(a = c('1', NA, '3', NA), 
                 b = c('a', 'b', 'c', NA), 
                 c = c('e', 'f', 'g', NA))

score 2 · Answer 6 · answered Jan 10 '19 at 21:58

This is similar to some of the above answers, but with this, you can specify if you want to remove rows with a percentage of missing values greater-than or equal-to a given percent (with the argument pct)

drop_rows_all_na <- function(x, pct=1) x[!rowSums(is.na(x)) >= ncol(x)*pct,]

Where x is a dataframe and pct is the threshold of NA-filled data you want to get rid of.

pct = 1 means remove rows that have 100% of its values NA. pct = .5 means remome rows that have at least half its values NA

How to use this FUN on df? – Kshitij Manvelikar Dec 23 '19 at 10:52 — Kshitij Manvelikar, Dec 23 '19 at 10:52

mattador · Answer 7 · 2021-05-11T18:01:13.320

0

Here's yet another answer if you just want a handy function wrapper. Also, many of the above solutions remove a row with ANY NAs, whereas this one only removes rows that are ALL NAs.

data <- rbind(c(1,2,3), c(1, NA, 4), c(4,6,7), c(NA, NA, NA), c(4, 8, NA)) # sample data
data

rmNArows<-function(d){
goodRows<-apply(d,1,function(x) sum(is.na(x))!=ncol(d))
d[goodRows,]
}

rmNArows(data)

edited May 11 '21 at 18:01

answered May 11 '21 at 17:52

mattador

421
4
12

Removing empty rows of a data file in R

7 Answers7

Linked

Related