Delete rows with blank values in one particular column

Question

I am working on a large dataset, with some rows with NAs and others with blanks:

df <- data.frame(ID = c(1:7),                                   
         home_pc = c("","CB4 2DT", "NE5 7TH", "BY5 8IB", "DH4 6PB","MP9 7GH","KN4 5GH"),               
         start_pc = c(NA,"Home", "FC5 7YH","Home", "CB3 5TH", "BV6 5PB",NA),               
         end_pc = c(NA,"CB5 4FG","Home","","Home","",NA))

How do I remove the NAs and blanks in one go (in the start_pc and end_pc columns)? I have in the past used:

df<- df[-which(is.na(df$start_pc)), ]

... to remove the NAs - is there a similar command to remove the blanks?

score 110 · Accepted Answer · answered Feb 03 '12 at 10:18

110

 df[!(is.na(df$start_pc) | df$start_pc==""), ]

answered Feb 03 '12 at 10:18

sgibb

25,396
3
68
74

6

`|` is an or-operator and `!` inverts. Hence, the command displays all rows, which are *not* b) NA or b) equal to "". – MERose Apr 22 '15 at 16:46
1

Wouldn't this code remove entire rows, as opposed to just consolidating them by removing empty values? – NBK Jun 27 '18 at 08:57
2

This is what I found works as well. I had a dataset where I wanted to remove the rows where I was missing data from the column. Executing this with my own data frame and assign the value to the new data frame did what I expected. – muninn Apr 27 '19 at 16:57

score 34 · Answer 2 · edited Feb 03 '12 at 11:03

34

It is the same construct - simply test for empty strings rather than NA:

Try this:

df <- df[-which(df$start_pc == ""), ]

In fact, looking at your code, you don't need the which, but use the negation instead, so you can simplify it to:

df <- df[!(df$start_pc == ""), ]
df <- df[!is.na(df$start_pc), ]

And, of course, you can combine these two statements as follows:

df <- df[!(df$start_pc == "" | is.na(df$start_pc)), ]

And simplify it even further with with:

df <- with(df, df[!(start_pc == "" | is.na(start_pc)), ])

You can also test for non-zero string length using nzchar.

df <- with(df, df[!(nzchar(start_pc) | is.na(start_pc)), ])

Disclaimer: I didn't test any of this code. Please let me know if there are syntax errors anywhere

edited Feb 03 '12 at 11:03

Richie Cotton

118,240
47
247
360

answered Feb 03 '12 at 10:19

Andrie

176,377
47
447
496

1

@richiecotton Thanks for the edit. `nzchar` is a neat way of testing for non-zero length, and I didnt' know about it. – Andrie Feb 03 '12 at 12:00
no of these solutions work in my case! `df<-structure(list(vars = structure(1:5, .Label = c("a", "b", "v", "d", "e", "s", "ds"), class = "factor"), '1' = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), '2' = c(NA, 0.9, NA, NA, NA), '3' = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_)), .Names = c("vars", "1", "2", "3"), row.names = c(NA, 5L), class = "data.frame")` – Estatistics Jun 26 '17 at 07:55

score 23 · Answer 3 · answered Aug 13 '19 at 09:57

23

An elegant solution with dplyr would be:

df %>%
  # recode empty strings "" by NAs
  na_if("") %>%
  # remove NAs
  na.omit

answered Aug 13 '19 at 09:57

Agile Bean

6,437
1
45
53

3

Very elegant solution and simple :) – SteveS Aug 18 '19 at 11:06
1

Indeed, esp. for someone who is only occasionally confronted with R code, clear and simple is a welcome sight. Wish more posts were like that! – mths Sep 16 '22 at 08:39

score 8 · Answer 4 · edited Jan 12 '20 at 07:37

8

An easy approach would be making all the blank cells NA and only keeping complete cases. You might also look for na.omit examples. It is a widely discussed topic.

df[df==""]<-NA
df<-df[complete.cases(df),]

edited Jan 12 '20 at 07:37

Ronak Shah

377,200
20
156
213

answered Mar 16 '16 at 22:18

user6074085

81
1
1

score 8 · Answer 5 · edited Aug 18 '19 at 16:42

8

Alternative solution can be to remove the rows with blanks in one variable:

df <- subset(df, VAR != "")

edited Aug 18 '19 at 16:42

SteveS

3,789
5
30
64

answered Apr 06 '16 at 00:27

user6164045

81
1
1

1

Welcome to Stack Overflow! Whilst this may theoretically answer the question, [it would be preferable](//meta.stackoverflow.com/q/8259) to include the essential parts of the answer here, and provide the link for reference. – Enamul Hassan Apr 06 '16 at 00:41

Delete rows with blank values in one particular column

5 Answers5

Linked

Related