11

I am trying to find a simple way of counting the non missing cases in a column of a data frame. I have used the function:

foo<- function(x) { sum(!is.na(x)) }

and then apply it to a data frame via sapply()

stats$count <- sapply(OldExaminee, foo2, simplify=T)

Although this is working fine, I am just in disbelieve that there isn't a simpler way of counting, i.e. something in the base set of function.

Any ideas?

zx8754
  • 52,746
  • 12
  • 114
  • 209
SprengMeister
  • 550
  • 1
  • 4
  • 12
  • welcome to SO `sapply` is in `base`.. and `sapply( yourdata , function( x ) sum( !is.na( x ) ) )` is pretty quick. :) – Anthony Damico Mar 29 '13 at 14:20
  • @AnthonyDamico thanks, I have used this page many times and finally decided to join. I am a loyal R user but I can't believe that there isn't a truly simple function for count. – SprengMeister Mar 29 '13 at 14:46
  • @SprengMeister, since your intended use is on a `data.frame`, you can use `colSums` with `is.na`. Check my answer. – Arun Mar 29 '13 at 15:17
  • @Arun yours is the most simple one yet. Thanks! – SprengMeister Mar 29 '13 at 15:55

3 Answers3

17

For a data.frame you can get it using colSums and is.na:

set.seed(45)
df <- data.frame(matrix(sample(c(NA,1:5), 50, replace=TRUE), ncol=5))
#    X1 X2 X3 X4 X5
# 1   3  2 NA  2 NA
# 2   1  5  1  1  4
# 3   1  1  3  2  3
# 4   2  2  3  5  3
# 5   2  2  5  2  2
# 6   1  2 NA  3  3
# 7   1  5  5  5  2
# 8   3 NA  4  1  5
# 9   1  2  3 NA  1
# 10 NA  1  1  2  2

colSums(!is.na(df))
# X1 X2 X3 X4 X5 
#  9  9  8  9  9 
Arun
  • 116,683
  • 26
  • 284
  • 387
9

you could use na.omit

length(na.omit(x));

along with apply as the post by caelorus indicates

Aditya Sihag
  • 5,057
  • 4
  • 32
  • 43
  • this is indeed simpler, but as I said to caerolus, not a "build in" function. Thanks for sharing. – SprengMeister Mar 29 '13 at 14:34
  • @SprengMeister the `table` function exlcudes `NA`, so you might say `sum(table(x))` is a little closer to 'built in'. But I suspect @AdityaSihag's answer is faster for large data. – Matthew Plourde Mar 29 '13 at 15:34
1

You can use which and length:

length(which(!is.na(x$col)))

which returns the indexes of the matching elements (in this case, the non-NAs), and length tells you how many of those indexes there are.

For all columns at once:

apply(OldExaminee, 2, function(x){ length(which(!is.na(x))) })
Julián Urbano
  • 8,378
  • 1
  • 30
  • 52