check empty or na values in columns in R

Question

I have a dataframe "d1" from a big MySQL table. I need find there an unused columns (which contains only NA or empty strings). (see question Find columns with all missing values ).

This seems to work fine:

allmisscols <- apply(d1,2, function(x)all(is.na(x)));
colswithallmiss <-names(allmisscols[allmisscols>0]);
cat( colswithallmiss,sep="\n");

...

allmisscols <- apply(d1,2, function(x)all(x==''));
colswithallmiss <-names(allmisscols[allmisscols>0]);
cat( colswithallmiss,sep="\n");

...

although the second one gives also "NA" among the column names; i don't understand why.

But when I try to combine them:

allmisscols <- apply(d1,2, function(x)all(is.na(x)||x=='') );  
colswithallmiss <-names(allmisscols[allmisscols>0]);    
print("the columns with all values missing");    
print(colswithallmiss);

I see a column in result that actually contain a value in my table!

The same gives following:

library(stringr); sapply(d1, function(x)all(any(is.na(x)||(str_trim(x)==""))))

So my questions are:

Why I've got such unexpected results?
How can I get the list of column which contains only empty OR N/A values?

score 2 · Answer 1 · answered Feb 27 '18 at 10:03

2

Try this:

allmisscols <- sapply(dt, function(x) all(is.na(x) | x == '' ))

Note: You've used OR as double '||' trying making it a single one. Read this SO post: Boolean operators && and ||

answered Feb 27 '18 at 10:03

YOLO

20,181
5
20
40

check empty or na values in columns in R

1 Answers1