I want to find the number of missing values in the factor and numerical variables in R. How do I do that?

Question

Here is how I found out the column names that are numerical and categorical.

split(names(my.data), sapply(my.data, function(x) paste(class(x), collape=" ")))$factor  

split(names(my.data), sapply(my.data, function(x) paste(class(x), collape=" ")))$numeric

From the above code i got a list of 30 categorical variables and 70 numerical variables. I am trying to find out the number of missing variables in all of them.

The output I am looking for: In all the Factor variables: Variable1 has xyz NA's

In the list of numerical variables Variable1 has xyz NA's

It would be easier with dplyr `my.data %>% summarise_if(~is.numeric(.)|is.factor(.), funs(sum(is.na(.))))` — akrun, Feb 07 '18 at 17:08
When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Feb 07 '18 at 17:13
If there are many variables, then you can convert the above output to a two column dataset `iris %>% summarise_if(~is.numeric(.)|is.factor(.), funs(sum(is.na(.)))) %>% unlist %>% enframe` — akrun, Feb 07 '18 at 17:25

score 0 · Answer 1 · answered Feb 07 '18 at 17:45

0

In base R you could do:

var_idxs <- apply(my_data, 2, function(x){is.numeric(x) || is.factor(x)})
vars <- names(my_data)[var_idxs]
apply(my_data[vars], 2, function(x){sum(is.na(x))})

Although I agree with @akrun that the dplyr way is more elegant :)

answered Feb 07 '18 at 17:45

Felipe Gerard

1,552
13
23

I want to find the number of missing values in the factor and numerical variables in R. How do I do that?

1 Answers1