I'm attempting to write a function to replace missing numeric values in the data frame with the median value of the numeric value. As well, I need to replace the missing characters with the value of the highest frequency of the character variables.
It needs to be accomplished without the use of any packages.
The data looks like this:
ID GLUC TGL HDL LDL HRT MAMM SMOKE
1 A 88 NA 32 99 Y <NA> ever
2 B NA 150 60 NA <NA> no never
3 C 110 NA NA 120 N <NA> <NA>
4 D NA 200 65 165 <NA> yes never
5 E 90 210 NA 150 Y <NA> never
6 F 88 NA 32 210 <NA> yes ever
EDIT
This is what I have so far and I'm not sure if I'm even close ...
impute<- function(dat, varlist) {
if (is.numeric(varlist)) {
res <- median(varlist, na.rm = TRUE)
}
else {
res <- dat[which.max(varlist)]
}
na.index <- which(is.na(varlist))
dat[na.index] <- res
dat
}