I have a dataframe "d1" from a big MySQL table. I need find there an unused columns (which contains only NA or empty strings). (see question Find columns with all missing values ).
This seems to work fine:
allmisscols <- apply(d1,2, function(x)all(is.na(x)));
colswithallmiss <-names(allmisscols[allmisscols>0]);
cat( colswithallmiss,sep="\n");
...
allmisscols <- apply(d1,2, function(x)all(x==''));
colswithallmiss <-names(allmisscols[allmisscols>0]);
cat( colswithallmiss,sep="\n");
...
although the second one gives also "NA" among the column names; i don't understand why.
But when I try to combine them:
allmisscols <- apply(d1,2, function(x)all(is.na(x)||x=='') );
colswithallmiss <-names(allmisscols[allmisscols>0]);
print("the columns with all values missing");
print(colswithallmiss);
I see a column in result that actually contain a value in my table!
The same gives following:
library(stringr); sapply(d1, function(x)all(any(is.na(x)||(str_trim(x)==""))))
So my questions are:
- Why I've got such unexpected results?
- How can I get the list of column which contains only empty OR N/A values?