I am required to build a function which uses mean to replace missing values for continuous/integer variables and uses mode to replace missing values for categorical variables.
The data comes from credit screening dataset
X <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data", header = FALSE, na.strings = '?')
The first column of the dataset is of factor type, second and third columns are numeric.....
I built a mode function
mode_function <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
Which works as intended.
The overall function that I am using on the dataset is
broken <- function(data){
for(i in 1:ncol(data)){
if(is.factor(data[,i])){
data[is.na(data[,i]),i] <- mode_function(data[,i])
}
else{
data[is.na(data[,i]),i] <- mean(data[,i], na.rm = TRUE)
}
}
return(data)
}
Problem: I run this function and nothing changes in my dataset. I still have the same number of missing values as I did before the function was run.
This line outside of the function works just as intended. The same with the code that deals with mean.
data[is.na(data[,i]),i] <- mode_function(data[,i])
But once I try to use my function to perform the exact same operations nothing happens.