0

I have a large data frame, with two specific rate% columns in R

  response rate       accept rate
1.   70%.                65%
2.   12%                 NA
3.   NA                  100%
4.   78%                 20%
5.   NA                  7%
6.   51%                 NA

I want to fill out the NA with column means for each column.I want the result like this:

(70%+12%+78%+51)/4=52.75% (65%+100%+20+0%)/4=46%

  begin rate              end rate
1.   70%.                    65%
2.   12%                     46%
3.   52.75%                 100%
4.   78%                     20%
5.   52.75%                   0%
6.   51%                     46%

I do not know how to achieve this in R. Thanks in advance!!

Phil
  • 7,287
  • 3
  • 36
  • 66
GuiW
  • 19
  • 3
  • This question has been answered in https://stackoverflow.com/questions/25835643/replace-missing-values-with-column-mean/25835810 – bdedu Nov 19 '21 at 02:45
  • Thanks!! But I am confused about the "%", which is a character column, not numeric. I am not sure how to do this in character column with "%" on it – GuiW Nov 19 '21 at 11:18
  • You can convert it to numeric and convert it back later. Please see my answer. – bdedu Nov 19 '21 at 16:47

1 Answers1

0
data <- data.frame(response_rate = c("70%", "12%", NA, "78%", NA, "51"),
                   accept_rate = c("65%", NA, "100%", "20%", "7%", NA))


fill_percent_mean <- function(v){
  v <- as.numeric(sub("%", "", v)) / 100 # convert to numeric
  
  v[which(is.na(v))] <- mean(v, na.rm = TRUE) # fill out NAs with column means 
  
  v <- paste0((v * 100), "%") # convert back to percentage
}

data1 <- data.frame(apply(data, 2, function(x) fill_percent_mean(x)))

data1

Result

  response_rate accept_rate
1           70%         65%
2           12%         48%
3        52.75%        100%
4           78%         20%
5        52.75%          7%
6           51%         48%
bdedu
  • 383
  • 1
  • 8