0

I have a data frame in R that contains a column of factors. The column is completely blank, although it does not contain any NA values. When I try to access one of the column's elements, it does show some factor levels.

I would like to identify this type of missing data and remove it. How do I do that?

Here's a copy of my data frame. The second column illustrates my problem:

> dput(effective_data)
structure(list(`Not this column` = structure(c(2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L), .Label = c("", "unavailable"), class = "factor"), `This column` = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("", "EVENT_VALUE_INTRA_ENB (1)", 
"EVENT_VALUE_X2 (2)"), class = "factor")), .Names = c("Not this column", 
"This column"), class = "data.frame", row.names = c(NA, -134L
))
Z.Lin
  • 28,055
  • 6
  • 54
  • 94
Kots
  • 486
  • 1
  • 5
  • 21
  • 2
    I've added your example dataset to the question. When you want to share data in a reproducible format with others on SO, use the command `dput()`. Anyone can take the result of that & get exactly the same data that gave you problems. You can read more about best practices here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Z.Lin Sep 06 '17 at 15:20

1 Answers1

0

You can try

effective_date[, !apply(effective_date,2,function(x) any(unique(x)==""))]

explanation

Iterate through columns, apply(..,2,..), looking for any columns where unique(x) == ""

apply(effective_date,2,function(x) any(unique(x)==""))
# Not this column     This column 
#           FALSE            TRUE

Then subset data frame with logical

effective_date[ , <logical>]
CPak
  • 13,260
  • 3
  • 30
  • 48