1

I have a data frame with 118 variables with 0's, 1's 99's and NA's. I need count for each variable how many 99's, NA's, 1's and 0's there is (the 99 is "not apply", the 0 is "no", the 1 is "yes" and the NA is "No answer"). I try to do this with table function but it works with vectors, how can I do it for all the set of variables?

There is a little reproducible example of the data frame:

forest<-c(1,1,1,1,0,0,0,1,1,1,0,NA,0,NA,0,99,99,1,0,NA)
water<-c(1,NA,NA,NA,NA,99,99,0,0,0,1,1,1,0,0,NA,NA,99,1,0)
rain<-c(1,NA,1,0,1,99,99,0,1,0,1,0,1,0,0,NA,99,99,1,1)
fire<-c(1,0,0,0,1,99,99,NA,NA,NA,1,0,1,0,0,NA,99,99,1,1)

df<-data.frame(forest,water,rain,fire)

And I need write in a data frame the result for variable, like this:

    forest    water    rain    fire
1    8         5        8       6
0    7         6        6       6
99   2         3        4       4
NA   3         6        2       4
Daniel
  • 383
  • 1
  • 4
  • 17

5 Answers5

2
rbind(sapply(df,table),"NA"=sapply(df, function(y) sum(is.na(y))))
   forest water rain fire
0       7     6    6    6
1       8     5    8    6
99      2     3    4    4
NA      3     6    2    4
Haboryme
  • 4,611
  • 2
  • 18
  • 21
2

Can't find a good dupe, so here's my comment as an answer:

A data frame is really a list of columns. lapply will apply a function to every item in the input (every column, in the case of a data frame) and return a list with each result:

lapply(df, table)
# $forest
# 
#  0  1 99 
#  7  8  2 
# 
# $water
# 
#  0  1 99 
#  6  5  3 
# 
# $rain
# 
#  0  1 99 
#  6  8  4 
# 
# $fire
# 
#  0  1 99 
#  6  6  4 

sapply is like lapply, but it will attempt to simplify the result instead of always returning a list. In both cases, you can pass along additional arguments to the function being applied, like useNA = "always" to table to have NA included in the output:

sapply(df, table, useNA = "always")
#      forest water rain fire
# 0         7     6    6    6
# 1         8     5    8    6
# 99        2     3    4    4
# <NA>      3     6    2    4

For lots more info, check out R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate


To compare with some other answers: apply is similar to lapply and sapply, but it is intended for use with matrices or higher-dimensional arrays. The only time you should use apply on a data.frame is when you need to apply a function to each row. For functions on data frame columns, prefer lapply or sapply. The reason is that apply will coerce the data frame to a matrix first, which can have unintended consequences if you have columns of different classes.

Community
  • 1
  • 1
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
1

This should do it:

tables <- apply(df, 2, FUN = table)
Zach
  • 1,103
  • 8
  • 11
1

There's probably a way to do it in one fell swoop.

apply(df, 2, table)

apply(df, 2, function(x){ sum(is.na(x)) })

1

As the variables are factors, you should first turn them into it:

df <- lapply(df, as.factor)

And then, summary your data.frame:

sapply(df, summary)

The factor method for the summary() function counts each level of it.

Tomás Barcellos
  • 814
  • 16
  • 26