I have an automobile data set (auto_data) with NA values across 6 columns.
auto_data$normalized.losses = replace_na(auto_data$normalized.losses,mean(auto_data$normalized.losses, na.rm = TRUE))
auto_data$num.of.doors = replace_na(auto_data$num.of.doors,mode(auto_data$num.of.doors, na.rm = TRUE))
for (i in 1:ncol(auto_data)) {
if ((is.numeric(i)) & (is.na(i)))
{
replace_na(i,mean(i, na.rm = TRUE))
}
}
This is as far as I have gotten, however for num.of.doors(character, either two or four), the replaced NAs read 'character' instead of either 'two' or 'four'. And the for loop just does not change anything.
I would also like the mode/mean to be grouped by make and body_style but figured I need to try and step through this preliminary step of getting means and modes setup first. I have messed around with adding a group by function wrapping replace_na().
Code source: https://www.kaggle.com/datasets/toramky/automobile-dataset?resource=download
make = c("alfa-romero", "alfa-romero", "alfa-romero",
"audi", "audi", "audi", "audi")
symboling = c(3L, 3L, 1L, 2L, 2L, 2L, 1L)
normalized.losses = c(NA, NA, NA, 164L, 164L, NA,
158L)
fuel.type = c("gas", "gas", "gas", "gas", "gas", "gas", "gas")
aspiration = c("std", "std", "std", "std", "std", "std", "std")
num.of.doors = c("two", "two", "two", "four", "four", "two", "four")
body.style = c("convertible", "convertible","hatchback", "sedan", "sedan", "sedan", "sedan")
price= c(13495, 18705, NA, 17217, 17293, NA, 18304)
auto_data_sample= data.frame(make,symboling,fuel.type,aspiration, num.of.doors, body.style, price)