Need to count the NA''s (missing values) for one variable, grouped by another variable

Question

Im very new to R but i find it very interesting to learn .

So i searched a lot and although there were a lot of posts addressing the issue of counting missing values in multiple columns using

na_count <-sapply(data, function(y) sum(length(which(is.na(y)))))
na_count <- data.frame(na_count)

but could not find a specific answer for my issue.

I have a dataset in which there is a column called species and another column called weight in which there are some missing values .

I need to find the missing values in 'weight'grouped by species . I need to use group_by and summarize.

One of the errors that Im getting is

Factor species contains implicit NA, consider using forcats::fct_explicit_na

I think this is related to the fact that the column im grouping by '(species) also has NA.

I have tried

DF %>% 
  group_by(species) %>% 
  summarize(funs(sum(is.na(weight))))

This doesnt work though.

Finally i need to impute the mean weight for each species in the missing values.

Cheers

Please make your question [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — NelsonGon, Apr 28 '19 at 05:07
Possible duplicate of [How to sum a variable by group](https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group) — NelsonGon, Apr 28 '19 at 05:25

score 1 · Answer 1 · answered Apr 28 '19 at 05:11

1

Here is a hypothetical data frame:

df = data_frame(species = sample(c("dogs", "cats", "horses"), 100, replace = T) ,
weight = sample(seq(100, 200), 100))

Let's put some NA's there:

df[sample(seq(1:100), 30), 2] = NA

Counting the NA's:

df %>% group_by(species) %>% summarise(NA_sum = sum(is.na(weight)))

And your final answer is:

df %>% group_by(species) %>% 
mutate(weight = ifelse(is.na(weight), mean(weight, na.rm = T), weight))

answered Apr 28 '19 at 05:11

Omry Atia

hi, i need to do this in two steps . I first need a count on the NAs for weight grouped by species. would also need column names and heading on the tibble . And only after that would i need to impute mean weight of the different species in the missing NAs. – axel_p Apr 28 '19 at 05:20
Its all there: the third line in the answer counts the NA; the fourth imputes the NA's – Omry Atia Apr 28 '19 at 05:23
Hi Axel, if you like my answer please consider accepting it by pressing the "v" sign. Thanks! – Omry Atia Apr 28 '19 at 05:33
I did but my reputation is too low for my vote to reflect for you publicly but i have upvoted your answer ! – axel_p Apr 28 '19 at 06:48
I have another issueif you could help out. I have to inspect the weight column in my data frame for special values , so I have defined a function to identify special values is.specialorNA <- function(x){if (is.numeric(x)) (is.infinite(x) | is.nan(x) | is.na(x))} this results in a very long logical vector identifying the values as True or False so I need a vector giving me the sum of the missing values for the weight column Im trying - sapply(surveys_weight_imputed, is.specialorNA, sum(is.na(x) ) but this is giving me an error - Error in FUN(X[[i]], ...) : unused argument (sum(is.na(x))) – axel_p Apr 28 '19 at 06:55

1 Answers1