0
impute.age <- function(age, sex){
  corrected.age <- age
  if(is.na(age) & sex == 1){
    corrected.age <- male.mean.age
  }else if(is.na(age) & sex == 0){
    corrected.age <- female.mean.age
  }else{
    return(corrected.age)
  }
}

I wrote this function, and this doesn't seem to be working. Is there any other ways where I can fill NA values in the Age column with the mean of value of Ages based on sexes (Male and Female)

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

2 Answers2

2

If you use indexing, you don't even need any if/else:

# example data
age <- sample(10:70, 50, replace = TRUE)
age[sample(length(age), 10)] <- NA_integer_
sex <- sample(c("M","F"),50, replace = TRUE)

# computing means
mean_M <- mean(age[sex == "M"], na.rm = TRUE)
mean_F <- mean(age[sex == "F"], na.rm = TRUE)

# replacing by the means
age[is.na(age) & sex == "M"] <- mean_M
age[is.na(age) & sex == "F"] <- mean_F
Alexlok
  • 2,999
  • 15
  • 20
2

Using R base:

ref_mean <- tapply(df$Age, df$Sex, mean, na.rm=T)
df[df$Sex=="Male" & is.na(df$Age), "Age"] <- ref_mean["Male"]
df[df$Sex=="Female" & is.na(df$Age), "Age"] <- ref_mean["Female"]

Using data.table:

library(data.table)
setDT(df)[, Age := nafill(Age, fill=mean(Age, na.rm=T)), by=Sex]

Using dplyr:

library(dplyr)
df %>% 
  group_by(Sex) %>%
  mutate(ifelse(is.na(Age), mean(Age, na.rm=T), Age))

where Age and Sex are your Age and Sex columns. Also, you may need to replace "Male" and "Female" by "1" and "0", respectively.

Cainã Max Couto-Silva
  • 4,839
  • 1
  • 11
  • 35