so I have this tibble main_df
which has some columns like "Rainfall_(mm)", "Speed_of_maximum_wind_gust_(km/h)", "9am_Temperature", "9am_relative_humidity_(%)", "9am_cloud_amount_(oktas)"
... etc. I tried to identify the numeric columns with this code col_type_vector <- sapply(main_df, typeof)
and for all numeric columns I want to replace the "NA" values with the median value of that column. note that I start from 3 because I don't want the first 2 columns.
the loop and the function is given below:
set_na_to_median <- function(data_frame, column_name) {
median_value <- median(data_frame[[column_name]], na.rm = TRUE)
na_indices <- which(is.na(data_frame[column_name]))
data_frame[na_indices, column_name] <- median_value
}
col_type_vector <- sapply(main_df, typeof)
for (item in names(col_type_vector)[3:length(names(col_type_vector))]) {
if (col_type_vector[item] == "integer" | col_type_vector[item] == "double" | col_type_vector[item] == "numeric") {
set_na_to_median(main_df, item)
}
}
but when I do it the NA
values do not get replaced. If I run the same code outside the function and loops manually it works perfectly. I have basically wasted my whole day on this? what am I doing wrong?
Thanks in advance.