I want to update NAs in numeric columns with median values for that column.
dt <- data.table(
name = c("A","B","C","D","E"),
sex = c("M","F",NA,"F","M"),
age = c(1,2,3,NA,4),
height = c(178.1, 162.1, NA, 169.5, 172.3)
)
Extract the numeric columns
num.cols <- sapply(dt, is.numeric)
num.cols <- names(num.cols)[num.cols]
Check values
median(dt[,age], na.rm = T) # 2.5
median(dt[,height], na.rm = T) #170.9
Use lapply for each num.cols
dt[,lapply(.SD, function(value)
ifelse(is.na(value), median(value, na.rm=TRUE), value)),
.SDcols = num.cols]
Question, I cannot work out how to overwrite the vector with NA with the vector of imputed medians in data.table syntax ?