0

In prior experiments, I occasionally substituted the mean for NA. This code has worked so far.

df <- transform(df,
                 var_1 = ave(var_1, group,
                             FUN = function(x) replace(x, is.na(x),
                                                       mean(x, na.rm = T))))
df

My data is also categorized.

df <- data.frame(group = c(rep("cake", 5), rep("cookie",5)),
                    var_1 = c(1, 2, NA, 3, 1, 8, 9, NA, 7, 8))
df

The result I'm looking for.

    group var_1
1    cake     1
2    cake     2
3    cake     1
4    cake     3
5    cake     1
6  cookie     8
7  cookie     9
8  cookie     8
9  cookie     7
10 cookie     8

I tried using the dplyr package to replace NA with the mode. However, it did not work. Instead, I received an error message.

# not working
library(dplyr)
df %>% group_by(group) %>% mutate(var_1 = na.aggregate(var_1, FUN = mode))

Also, this code isn't working either.

# not working
library(dplyr)
df %>% group_by(group) %>% mutate(var_1 = if_else(is.na(var_1), 
                         calc_mode(var_1), var_1))

Here's a sample of an error.

Error: Problem with `mutate()` column `var_1`.
ℹ `var_1 = if_else(is.na(var_1), calc_mode(var_1), var_1)`.
x could not find function "calc_mode"
ℹ The error occurred in group 1: group = "cake".
Error: Problem with `mutate()` column `var_1`.
ℹ `var_1 = na.aggregate(var_1, FUN = mode)`.
x could not find function "na.aggregate"
ℹ The error occurred in group 1: group = "cake".

Any ideas would be greatly appreciated. Thank you very much for your advice.

cccn
  • 27
  • 1
  • 6
  • Both of those errors say you're using functions that aren't loaded. Do you still get an error when you load the packages you're trying to work with? – camille Dec 18 '21 at 01:01
  • Yes. I am certain that I correctly loaded the package. I greatly thank you for your suggestion. – cccn Dec 18 '21 at 05:03

1 Answers1

1
na_replace_Mode <- function(x) {
  ux <- unique(na.omit(x))
  x[is.na(x)] <- ux[which.max(tabulate(match(x, ux)))]
  x
}

transform(df, var_1 = ave(var_1, group, FUN = na_replace_Mode))

   group var_1
1    cake     1
2    cake     2
3    cake     1
4    cake     3
5    cake     1
6  cookie     8
7  cookie     9
8  cookie     8
9  cookie     7
10 cookie     8

You could also do:

Mode <- function(x) {
  x <- na.omit(x)
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

df %>%
   group_by(group) %>%
   mutate(var_1 = replace_na(Mode(var_1)))
# A tibble: 10 x 2
# Groups:   group [2]
   group  var_1
   <chr>  <dbl>
 1 cake       1
 2 cake       2
 3 cake       1
 4 cake       3
 5 cake       1
 6 cookie     8
 7 cookie     9
 8 cookie     8
 9 cookie     7
10 cookie     8
Onyambu
  • 67,392
  • 3
  • 24
  • 53