1

I have following matrix.

r1 <- c("M","A","T","D","T","Y")
r2 <- c("M","A","G","G","D", "J")
r3 <- c("M","B","H","G","T", "Y")
r4 <- c("M","B","G","G","X", "Y")
r5<- c("F","A","H","D","T", "Y")
n.mat <- rbind(r1,r2,r3,r4,r5)
n.mat<-as.data.frame(n.mat)

I would like to replace values in columns with only two values by the most frequent value (for each column). And leave columns with more than two unique values as they are.

Output:

r1 <- c("M","A","T","G","T","Y")
r2 <- c("M","A","G","G","D", "Y")
r3 <- c("M","A","H","G","T", "Y")
r4 <- c("M","A","G","G","X", "Y")
r5<- c("M","A","H","G","T", "Y")
n.mat <- rbind(r1,r2,r3,r4,r5)
n.mat<-as.data.frame(n.mat)
Luker354
  • 659
  • 3
  • 8

1 Answers1

3

We may use the Mode function with a condition check on the length of unique elements in the column i.e. if the number of unique elements is greater than 2, return the column or else get the Mode

n.mat[] <- lapply(n.mat, function(x) if(length(unique(x)) > 2) x else Mode(x))

where

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

-output

> n.mat
   V1 V2 V3 V4 V5 V6
r1  M  A  T  G  T  Y
r2  M  A  G  G  D  Y
r3  M  A  H  G  T  Y
r4  M  A  G  G  X  Y
r5  M  A  H  G  T  Y

If we need a matrix as output, use as.matrix on the output above

n.mat <- as.matrix(n.mat)

Or use apply instead of lapply

apply(n.mat, 2, FUN = function(x) if(length(unique(x)) > 2) x 
       else rep(Mode(x), length(x)))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks a lot, this looks perfect, but in the end I don't get a matrix, don't I? – Luker354 Nov 07 '21 at 20:13
  • 1
    @Luker354 you start with a data.frame. So, it ends up with the same structure. If you need a matrix, just do `n.mat <- as.matrix(n.mat)` – akrun Nov 07 '21 at 20:15