-3

Im very new to R but i find it very interesting to learn .

So i searched a lot and although there were a lot of posts addressing the issue of counting missing values in multiple columns using

na_count <-sapply(data, function(y) sum(length(which(is.na(y)))))
na_count <- data.frame(na_count)

but could not find a specific answer for my issue.

I have a dataset in which there is a column called species and another column called weight in which there are some missing values .

I need to find the missing values in 'weight'grouped by species . I need to use group_by and summarize.

One of the errors that Im getting is

Factor species contains implicit NA, consider using forcats::fct_explicit_na

I think this is related to the fact that the column im grouping by '(species) also has NA.

I have tried

DF %>% 
  group_by(species) %>% 
  summarize(funs(sum(is.na(weight))))

This doesnt work though.

Finally i need to impute the mean weight for each species in the missing values.

Cheers

axel_p
  • 51
  • 9
  • Please make your question [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – NelsonGon Apr 28 '19 at 05:07
  • 1
    Possible duplicate of [How to sum a variable by group](https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group) – NelsonGon Apr 28 '19 at 05:25

1 Answers1

1

Here is a hypothetical data frame:

df = data_frame(species = sample(c("dogs", "cats", "horses"), 100, replace = T) ,
weight = sample(seq(100, 200), 100))

Let's put some NA's there:

df[sample(seq(1:100), 30), 2] = NA

Counting the NA's:

df %>% group_by(species) %>% summarise(NA_sum = sum(is.na(weight)))

And your final answer is:

df %>% group_by(species) %>% 
mutate(weight = ifelse(is.na(weight), mean(weight, na.rm = T), weight))
Omry Atia
  • 2,411
  • 2
  • 14
  • 27
  • hi, i need to do this in two steps . I first need a count on the NAs for weight grouped by species. would also need column names and heading on the tibble . And only after that would i need to impute mean weight of the different species in the missing NAs. – axel_p Apr 28 '19 at 05:20
  • Its all there: the third line in the answer counts the NA; the fourth imputes the NA's – Omry Atia Apr 28 '19 at 05:23
  • Hi Axel, if you like my answer please consider accepting it by pressing the "v" sign. Thanks! – Omry Atia Apr 28 '19 at 05:33
  • I did but my reputation is too low for my vote to reflect for you publicly but i have upvoted your answer ! – axel_p Apr 28 '19 at 06:48
  • I have another issueif you could help out. I have to inspect the weight column in my data frame for special values , so I have defined a function to identify special values is.specialorNA <- function(x){if (is.numeric(x)) (is.infinite(x) | is.nan(x) | is.na(x))} this results in a very long logical vector identifying the values as True or False so I need a vector giving me the sum of the missing values for the weight column Im trying - sapply(surveys_weight_imputed, is.specialorNA, sum(is.na(x) ) but this is giving me an error - Error in FUN(X[[i]], ...) : unused argument (sum(is.na(x))) – axel_p Apr 28 '19 at 06:55