How can I exchange all the NAs in the columns for their medians?

Question

I would like to exchange all NA values in the columns for the respective medians

id <- c(1,2,3,4,5,6,7,8,9,10)
varA <- c(15,10,8,19,7,5,NA,11,12,NA)
varB <- c(NA,1,2,3,4,3,3,2,1,NA)
df <- data.frame(id, varA,varB)

median(df$varA, na.rm=TRUE)
median(df$varB, na.rm=TRUE)

df1 <- df

# Columns to be modified with Median in place of the NA

col <- c("varA", "varB")                           

df1[col] <- sapply(df1[col],  
                              function(x) replace(x, x %in% is.na(df1), median[col]))
df1

Error in [.default(df1, col) : invalid subscript type 'closure'

You're anonymous function `replace(x, x %in% is.na(df1), median[col]))` has all sorts of problems: Use `x` only, no `df1` or `col`. And you can't use `[` on functions, `median[col]` makes no sense and causes your error. Change it to ``replace(x, is.na(x), median(x, na.rm = TRUE))` and it has a chance of working. — Gregor Thomas, Nov 27 '22 at 01:42

score 2 · Answer 1 · answered Nov 27 '22 at 01:41

2

We may use

library(zoo)
df[col] <-  na.aggregate(df[col], FUN = median)

answered Nov 27 '22 at 01:41

akrun

874,273
37
540
662

Giulio Centorame · Answer 2 · 2022-11-27T01:41:25.140

1

dplyr + tidyr solution

library(dplyr)
library(tidyr)

df %>% 
mutate(varA = replace_na(varA, median(varA, na.rm = TRUE)),
       varB = replace_na(varB, median(varB, na.rm = TRUE)))

edited Nov 27 '22 at 01:41

answered Nov 27 '22 at 01:39

Giulio Centorame

678
4
19

Thanks for spotting it, I was just editing the post – Giulio Centorame Nov 27 '22 at 01:42

score 0 · Answer 3 · answered Nov 27 '22 at 02:32

Another option, which is similar to your original attempt.

df1[col] <- apply(df1[col], 2, \(x) ifelse(is.na(x), median(x, na.rm = TRUE), x) ) 
df1
#>    id varA varB
#> 1   1 15.0  2.5
#> 2   2 10.0  1.0
#> 3   3  8.0  2.0
#> 4   4 19.0  3.0
#> 5   5  7.0  4.0
#> 6   6  5.0  3.0
#> 7   7 10.5  3.0
#> 8   8 11.0  2.0
#> 9   9 12.0  1.0
#> 10 10 10.5  2.5

How can I exchange all the NAs in the columns for their medians?

3 Answers3