0

how to calculate colMedian using colMedian function. I get an error: Argument 'x' must be a matrix or a vector.

col_medians <- round(colMedians(impute_marks[,-1], na.rm=TRUE),0)
k <- which(is.na(impute_marks), arr.ind=TRUE)
impute_marks[k] <- col_medians[k[,-1]]

I need to do the below operation for all the columns other than first column in the data frame. Below code works fine. but I in for loop gives an error unknown courses when looped.

impute_marks$c1[is.na(impute_marks$c1)] <- round(mean(impute_marks$c1[!is.na(impute_marks$c1)]),0)

here, impute_marks is the name of the dataset and c1 is the column name.

using the above operation I am able to find the mean and replace all NA values in c1 (column). But I have 30+ columns. How can I write the above operation in a for loop to loop through each course and replace NA value with the mean?

my function for the operation:

impute_marks$F27SA[is.na(impute_marks$F27SA)] <- round(mean(impute_marks$F27SA[!is.na(impute_marks$F27SA)]),0)

imputing_using_mean <- function()
{
    courses <- names(impute_marks)[2:26]  
    for(i in seq_along(courses))
    {
      impute_marks$courses[[i]][is.na(impute_marks$courses[[i]])] <- round(mean(impute_marks$courses[[i]][!is.na(impute_marks$courses[[i]])]),0)
    }
}

imputing_using_mean()
Murlidhar Fichadia
  • 2,589
  • 6
  • 43
  • 93

2 Answers2

1

Essentially the same as answer from @Aaron on Replace NA values by row means . Tweaked to account for the first column.

marks <- read.table(text="
  a  1 NA  3
  b  1  2  3
  c NA NA NA
  ")
col_means <- round(colMeans(marks[,-1], na.rm=TRUE), 0)
k <- which(is.na(marks), arr.ind=TRUE)
marks[k] <- col_means[k[,2]-1]

#  V1 V2 V3 V4
#1  a  1  2  3
#2  b  1  2  3
#3  c  1  2  3
Community
  • 1
  • 1
Andrew Lavers
  • 4,328
  • 1
  • 12
  • 19
0

Below is a solution for calculating median for each column and replacing each NA values with the median calculated for each column. same goes for mean as well but the step to convert it to a matrix is not required.

# first convert it to matrix
matrix_marks <- as.matrix(impute_marks)

$calculate the median for each column
col_medians <- round(colMedians(matrix_marks[,-1], na.rm=TRUE),0)

#get the index for each NA values
k <- which(is.na(matrix_marks), arr.ind=TRUE)

finally replace those values with median value.
matrix_marks[k] <- col_medians[k[,-1]]
Murlidhar Fichadia
  • 2,589
  • 6
  • 43
  • 93