Replace NA in all columns of a dplyr chain

Question

The question replace NA in a dplyr chain results into the solution

dt %.% group_by(a) %.% mutate(b = ifelse(is.na(b), mean(b, na.rm = T), b))

with dplyr. I want to impute all colums with dplyr chain. There is no single column to group by, rather I want all numeric columns to have all NAs replaced by the means such as column means.

What is the most elegant way to replace all NAs with column means with tidyverse/dp?

Use `dt %>% group_by(a) %>% mutate_all(funs(ifelse(is.na(.), mean(., na.rm = TRUE), .))` If you want elegance, then `dt %>% group_by(a) %>% mutate_all(zoo::na.aggregate)` — akrun, Jan 02 '18 at 10:17
@akrun what about if you continue with long dplyr chain after the `group_by(a)`? And what is the `a`? Do you have to use `ungroup` after the `group_by`? — hhh, Jan 02 '18 at 10:19
It depends on the operation you want afterwards. Sometimes, I use `ungroup` because `spread` etc may not work with group columns — akrun, Jan 02 '18 at 10:20
If you are using `rowwise` sum use the `rowSums` i.e. `%>% ungroup %>% mutate(newSum = rowSums(.[columnsofinterest]))` — akrun, Jan 02 '18 at 10:24
Possible duplicate of [na.locf using group\_by from dplyr](https://stackoverflow.com/questions/43212308/na-locf-using-group-by-from-dplyr) — h3rm4n, Jan 02 '18 at 10:40
It would be useful for downvoters to explain their votes so I could clarify the question as needed. I thank akrun for the answer where the `zoo::na.aggregate()` works very well :) — hhh, Jan 02 '18 at 10:45
Related: [*Handle Continous Missing values in time-series data*](https://stackoverflow.com/q/32694313/2204410) — Jaap, Jan 02 '18 at 11:22

akrun · Accepted Answer · 2018-01-02T10:35:35.080

9

We can use mutate_all with ifelse

dt %>%
   group_by(a) %>% 
   mutate_all(funs(ifelse(is.na(.), mean(., na.rm = TRUE), .)))

If we want a compact option, then use the na.aggregate from zoo which by default replace NA values with mean

dt %>% 
   group_by(a) %>% 
   mutate_all(zoo::na.aggregate)

If we don't have a grouping variable, then remove the group_by and use mutate_if (just to be cautious about having some non-numeric column)

dt %>%
   mutate_if(is.numeric, zoo::na.aggregate)

If all the columns are numeric, even

zoo::na.aggregate(dt)

data

set.seed(42)
dt <- data.frame(a = rep(letters[1:3], each = 3),
                 b= sample(c(NA, 1:5), 9, replace = TRUE), 
                 c = sample(c(NA, 1:3), 9, replace = TRUE))

edited Jan 02 '18 at 10:35

answered Jan 02 '18 at 10:19

akrun

874,273
37
540
662

@hhh The `a` is from your code only.. If you have a different grouping variable, use that one – akrun Jan 02 '18 at 10:26
@hhh Isn't that just a row number column? I thought you wanted to group by some column and then do the replacement based on it – akrun Jan 02 '18 at 10:30
1

@hhh If it is a row number, it is just a unique element. So, You don't need any `group_by`, just do `dt %>% mutate_all(zoo::na.aggregate))` or apply the `zoo::na.aggregate(dt)` on the entire dataset – akrun Jan 02 '18 at 10:33
1

Thank you, they worked `zoo::na.aggregate(data.frame(a=c(1,2,NaN),b=c(10,NaN,12)))` and `data.frame(a=c(1,2,NaN),b=c(10,NaN,12)) %>% zoo::na.aggregate()`. – hhh Jan 02 '18 at 10:37

Replace NA in all columns of a dplyr chain

1 Answers1

data

Related