how can I apply a function to all dataframe variables?

Question

I want have a dataframe with something like 90 variables, and over 1 million observations. I want to calculate the percentage of NA rows on each variable. I have the following code: sum(is.na(dataframe$variable) / nrow(dataframe) * 100) My question is, how can I apply this function to all 90 variables, without having to type all variable names in the code?

Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. — Jaap, Nov 05 '15 at 16:13

score 3 · Answer 1 · answered Nov 05 '15 at 16:14

3

Use lapply() with your method:

lapply(df, function(x) sum(is.na(x))/nrow(df)*100)

answered Nov 05 '15 at 16:14

maccruiskeen

2,748
2
13
23

or this: `lapply(df, function(x) mean(is.na(x)))` – davechilders Nov 05 '15 at 16:44

score 3 · Answer 2 · answered Nov 05 '15 at 16:22

If you want to return a data.frame rather than a list (via lapply()) or a vector (via sapply()), you can use summarise_each from the dplyr package:

library(dplyr)

df %>%
  summarise_each(funs(sum(is.na(.)) / length(.)))

or, even more concisely:

df %>% summarise_each(funs(mean(is.na(.))))

data

df <- data.frame(
  x = 1:10,
  y = 1:10,
  z = 1:10
)

df$x[c(2, 5, 7)] <- NA
df$y[c(4, 5)] <- NA

how can I apply a function to all dataframe variables?

2 Answers2

data