I'm working a lot using Excel and R in my job and I've been trying to automatize a type of form my Boss asks me about the data quality. I've just recently started working with R so my code isn't the best.
The idea is to do a data.frame that summarizes in each column these vectors. Sum of all na's in the data.frame, percentage of NA in the data.frame and then filtering by some columns is the n of NAs in a level.
The code I've tried is the following one:
rowsna <- c("Total NA", "% NA", "n NA Variable 1, level 1",...)
na_count <- df %>% summarise_all(~sum(is.na(.)))
na_count[2, ] <- df %>% summarise_all(~mean(is.na(.)))
na_count[3, ] <- df %>% filter(variable == value) %>% summarise_all(~sum(is.na(.)))
...
row.names(na_count) <- rowsna
na_count <- as.data.frame(t(na_count))
na_count$variable
The thing is, I've got no idea how to calc the percentage of missing in the na_count[2 , ] part. I would like some help if possible.