Remove NA from a dataset in R

Question

I have used this function to remove rows that are not blanks:

data <- data[data$Age != "",]

in this dataset

     Initial  Age   Type

1    S        21    Customer

2    D              Enquirer

3    T        35    Customer

4    D        36    Customer

However if I run the above code, I get this:

     Initial  Age   Type

1    S        21    Customer

N/A  N/A      N/A   N/A

3    T        35    Customer

4    D        36    Customer

When all I want is:

     Initial  Age   Type

1    S        21    Customer

3    T        35    Customer

4    D        36    Customer

I just want the dataset without any NAs and I wanted to remove any rows that are not blank, so ideally all NAs and any that are just "".

I have tried the na.omit function but this deletes everything from my dataset.

This is an example dataset I have used, but in my dataset there's over 1000 columns and I would like to remove all rows that are NA for a particular column name.

This is my first post, I apologise if this isn't the right way to write up my code, plus I am very new to R.

Also my row number has converted to NA when I don't want it there, it's messing up my calculation.

Thank you for taking time to read and commenting this post.

Does this answer your question? [Omit rows containing specific column of NA](https://stackoverflow.com/questions/11254524/omit-rows-containing-specific-column-of-na) — maydin, Sep 16 '21 at 12:10
I've tried data[!is.na(data$Age),] before and it doesn't work. I get list of columns (there's more than 1000 columns) and no change to the dataset, rows are not deleted or removed, there is no error or warning messages. Do I need to update R? I installed it in July this year. — Sunny, Sep 16 '21 at 12:46
It will help to understand exactly what those empty values are. Can you share a snippet of your data with `dput(head(data, c(10, 10)))` provided that there's an example of one of these missing values in that chunk? — Dan Adams, Sep 16 '21 at 13:02
It does sound like these might not be actual `NA` values but something else. If they're blank strings, then those columns won't be numeric and all the tricks to remove `NA` won't work. — Dan Adams, Sep 16 '21 at 13:19
Please provide enough code so others can better understand or reproduce the problem. — Community, Sep 16 '21 at 19:59

score 3 · Accepted Answer · answered Sep 16 '21 at 13:27

As pointed out in the comments, it would be good to know what the exact values in the "empty" Age cells are. When I recreate the above data snippet using:

data <- data.frame(Initial = c("S", "D", "T", "D"),
               Age = c(21, "", 35, 36),
               Type = c("Customer", "Enquirer", "Customer", "Customer"))

We can see that "Age" is transformed into column of type "character". Using the following code we can effectively remove those "empty" Age rows:

data <- subset(data, is.finite(as.numeric(Age)))

This takes the subset of the dataframe "data" where a numeric version of the Age variable is a finite number, thus eliminating the rows with missing Age values.

Hope this solves your problem!

score 0 · Answer 2 · edited Sep 16 '21 at 15:02

0

Thank you @ M.P.Maurits

This formula worked!

data <- subset(data, is.finite(as.numeric(Age)))

The column was actually an integer but when changed to numeric it removed all rows that were imported as blank but shown as NAs. I didn't think that integer or numeric would be a difference.

Thank you to everyone else who also commented, much appreciated :)

edited Sep 16 '21 at 15:02

ThomasIsCoding

96,636
9
24
81

answered Sep 16 '21 at 14:57

Sunny

3
3

score 0 · Answer 3 · answered Sep 16 '21 at 15:06

0

A simple solution based on dplyr's function filter:

library(dplyr)
data %>%
  filter(!Age == "")
  Initial Age     Type
1       S  21 Customer
2       T  35 Customer
3       D  36 Customer

answered Sep 16 '21 at 15:06

Chris Ruehlemann

20,321
4
12
34

Remove NA from a dataset in R

3 Answers3