3

I have a large data set with 11 columns and 100000 rows (for example) in which i have values 1,2,3,4. Where 4 is a missing value. What i need is to compute the Mode. I am using following data and function

ac<-matrix(c("4","4","4","4","4","4","4","3","3","4","4"), nrow=1, ncol=11)  

m<-as.matrix(apply(ac, 1, Mode))

if i use the above command then it will give me "4" as the Mode, which i do not need. I want that the Mode will omit 4 and display "3" as Mode, because 4 is a missing value.

Thanks in advance.

Iftikhar
  • 667
  • 4
  • 10
  • 17

2 Answers2

7

R has a powerful mechanism to work with missing values. You can represent a missing value with NA and many of the R functions have support for dealing with NA values.

Create a small matrix with random numbers:

set.seed(123)
m <- matrix(sample(1:4, 12, replace=TRUE), ncol=3)
m
     [,1] [,2] [,3]
[1,]    2    4    3
[2,]    4    1    2
[3,]    2    3    4
[4,]    4    4    2

Since you represent missingness by the value 4, you can replace each occurrence by NA:

m[m==4] <- NA
m

     [,1] [,2] [,3]
[1,]    2   NA    3
[2,]   NA    1    2
[3,]    2    3   NA
[4,]   NA   NA    2

To calculate, for example, the mean:

mean(m[1, ], na.rm=TRUE)
[1] 2.5

apply(m, 1, mean, na.rm=TRUE)
[1] 2.5 1.5 2.5 2.0

To calculate the mode, you can use the function Mode in package prettyR: (Note that in this very small set of data, only the 4th row has a unique modal value:

apply(m, 1, Mode, na.rm=TRUE)
[1] ">1 mode" ">1 mode" ">1 mode" "2"     
Andrie
  • 176,377
  • 47
  • 447
  • 496
2

One way of doing it (though I'm not too sure on its performance):

tcnt<-table(ac, exclude="4")
actualmode<-names(tcnt)[which.max(tcnt)]

This is code for looking for the overall mode, but it's easily adapted to look within rows. Or, based upon some answer to an old question on the R mailing list by Thomas Lumley, a oneliner:

names(sort(-table(ac, exclude="4")))[1]
Nick Sabbe
  • 11,684
  • 1
  • 43
  • 57