0

I am a beginner in R. I have a small task to do.

I am trying to find the columns of a data frame that have less that 2 null values

The data frame I am working on in as below,

df=
      a       b     c
1.    NA     NA     NA
2.    NA     NA     10
3.    NA     NA     23
4.    NA     60     54
5.    NA     60     67

Typically I want the column (c) from the above dataframe as an output

The code I have attempted is:

na_count <- sapply(df, function(y) sum(length(which(is.na(y)))))
na_count <- data.frame(na_count)
newdf <- na_count[na_count$na_count < 2,]

Using the above code I get an output as;

[1]    1

The out put gives the count of NA in Column (c).

I understand why I am getting the above output. But can't find a way to correct it.

Any help would be appreciated.

Sam
  • 2,545
  • 8
  • 38
  • 59
  • 3
    There are similar questions on here somewhere; generally `colSums` is the way to go: `names(df)[colSums(is.na(df)) < 2]` – alistaire Nov 17 '16 at 07:44
  • try `sapply(df, function(x) sum(is.na(x)))` to get the count of NA and then use a `which` to identify the index where the NA count is < than a specified level. – Chirayu Chamoli Nov 17 '16 at 07:45
  • 2
    Possible duplicate of [Remove NA columns in a list of dataframes](http://stackoverflow.com/questions/33419200/remove-na-columns-in-a-list-of-dataframes) – alistaire Nov 17 '16 at 07:50
  • @alistaire , the solution worked .. Thanks – Sam Nov 17 '16 at 07:50

0 Answers0