0

I have used this function to remove rows that are not blanks:

data <- data[data$Age != "",]

in this dataset

     Initial  Age   Type

1    S        21    Customer

2    D              Enquirer

3    T        35    Customer

4    D        36    Customer  

However if I run the above code, I get this:

     Initial  Age   Type

1    S        21    Customer

N/A  N/A      N/A   N/A

3    T        35    Customer

4    D        36    Customer  

When all I want is:

     Initial  Age   Type

1    S        21    Customer

3    T        35    Customer

4    D        36    Customer  

I just want the dataset without any NAs and I wanted to remove any rows that are not blank, so ideally all NAs and any that are just "".

I have tried the na.omit function but this deletes everything from my dataset.

This is an example dataset I have used, but in my dataset there's over 1000 columns and I would like to remove all rows that are NA for a particular column name.

This is my first post, I apologise if this isn't the right way to write up my code, plus I am very new to R.

Also my row number has converted to NA when I don't want it there, it's messing up my calculation.

Thank you for taking time to read and commenting this post.

maydin
  • 3,715
  • 3
  • 10
  • 27
Sunny
  • 3
  • 3
  • Are you sure that age on record 2 is empty spaces? – Nicolas2 Sep 16 '21 at 12:07
  • Does this answer your question? [Omit rows containing specific column of NA](https://stackoverflow.com/questions/11254524/omit-rows-containing-specific-column-of-na) – maydin Sep 16 '21 at 12:10
  • `data[!is.na(data$Age),]` – maydin Sep 16 '21 at 12:10
  • I've tried data[!is.na(data$Age),] before and it doesn't work. I get list of columns (there's more than 1000 columns) and no change to the dataset, rows are not deleted or removed, there is no error or warning messages. Do I need to update R? I installed it in July this year. – Sunny Sep 16 '21 at 12:46
  • It will help to understand exactly what those empty values are. Can you share a snippet of your data with `dput(head(data, c(10, 10)))` provided that there's an example of one of these missing values in that chunk? – Dan Adams Sep 16 '21 at 13:02
  • It does sound like these might not be actual `NA` values but something else. If they're blank strings, then those columns won't be numeric and all the tricks to remove `NA` won't work. – Dan Adams Sep 16 '21 at 13:19
  • Try with `na.omit`. – Eyayaw Sep 16 '21 at 14:13
  • `data[data$Age != "",]` works perfectly with me – Chris Ruehlemann Sep 16 '21 at 15:08
  • Please provide enough code so others can better understand or reproduce the problem. – Community Sep 16 '21 at 19:59

3 Answers3

3

As pointed out in the comments, it would be good to know what the exact values in the "empty" Age cells are. When I recreate the above data snippet using:

data <- data.frame(Initial = c("S", "D", "T", "D"),
               Age = c(21, "", 35, 36),
               Type = c("Customer", "Enquirer", "Customer", "Customer"))

We can see that "Age" is transformed into column of type "character". Using the following code we can effectively remove those "empty" Age rows:

data <- subset(data, is.finite(as.numeric(Age)))

This takes the subset of the dataframe "data" where a numeric version of the Age variable is a finite number, thus eliminating the rows with missing Age values.

Hope this solves your problem!

0

Thank you @ M.P.Maurits

This formula worked!

data <- subset(data, is.finite(as.numeric(Age)))

The column was actually an integer but when changed to numeric it removed all rows that were imported as blank but shown as NAs. I didn't think that integer or numeric would be a difference.

Thank you to everyone else who also commented, much appreciated :)

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
Sunny
  • 3
  • 3
0

A simple solution based on dplyr's function filter:

library(dplyr)
data %>%
  filter(!Age == "")
  Initial Age     Type
1       S  21 Customer
2       T  35 Customer
3       D  36 Customer
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34