0

Here is how I found out the column names that are numerical and categorical.

split(names(my.data), sapply(my.data, function(x) paste(class(x), collape=" ")))$factor  

split(names(my.data), sapply(my.data, function(x) paste(class(x), collape=" ")))$numeric  

From the above code i got a list of 30 categorical variables and 70 numerical variables. I am trying to find out the number of missing variables in all of them.

The output I am looking for: In all the Factor variables: Variable1 has xyz NA's

In the list of numerical variables Variable1 has xyz NA's

NMB
  • 35
  • 1
  • 4
  • 1
    It would be easier with dplyr `my.data %>% summarise_if(~is.numeric(.)|is.factor(.), funs(sum(is.na(.))))` – akrun Feb 07 '18 at 17:08
  • 3
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Feb 07 '18 at 17:13
  • If there are many variables, then you can convert the above output to a two column dataset `iris %>% summarise_if(~is.numeric(.)|is.factor(.), funs(sum(is.na(.)))) %>% unlist %>% enframe` – akrun Feb 07 '18 at 17:25

1 Answers1

0

In base R you could do:

var_idxs <- apply(my_data, 2, function(x){is.numeric(x) || is.factor(x)})
vars <- names(my_data)[var_idxs]
apply(my_data[vars], 2, function(x){sum(is.na(x))})

Although I agree with @akrun that the dplyr way is more elegant :)

Felipe Gerard
  • 1,552
  • 13
  • 23