1

I have a dataframe "d1" from a big MySQL table. I need find there an unused columns (which contains only NA or empty strings). (see question Find columns with all missing values ).

This seems to work fine:

allmisscols <- apply(d1,2, function(x)all(is.na(x)));
colswithallmiss <-names(allmisscols[allmisscols>0]);
cat( colswithallmiss,sep="\n");

...

allmisscols <- apply(d1,2, function(x)all(x==''));
colswithallmiss <-names(allmisscols[allmisscols>0]);
cat( colswithallmiss,sep="\n");

...

although the second one gives also "NA" among the column names; i don't understand why.

But when I try to combine them:

allmisscols <- apply(d1,2, function(x)all(is.na(x)||x=='') );  
colswithallmiss <-names(allmisscols[allmisscols>0]);    
print("the columns with all values missing");    
print(colswithallmiss);

I see a column in result that actually contain a value in my table!

The same gives following:

library(stringr); sapply(d1, function(x)all(any(is.na(x)||(str_trim(x)==""))))

So my questions are:

  1. Why I've got such unexpected results?
  2. How can I get the list of column which contains only empty OR N/A values?
YOLO
  • 20,181
  • 5
  • 20
  • 40

1 Answers1

2

Try this:

allmisscols <- sapply(dt, function(x) all(is.na(x) | x == '' ))

Note: You've used OR as double '||' trying making it a single one. Read this SO post: Boolean operators && and ||

YOLO
  • 20,181
  • 5
  • 20
  • 40