-1

I am working on a function to return the column name of the largest value for each row. Something like:

colnames(x)[apply(x,1,which.max)]

However, before applying a function like this is there a straight forward and general way to replace ties with NA (or any other arbitrary letter etc.)?

I have the following matrix:

               0            1
 [1,] 5.000000e-01 0.5000000000
 [2,] 9.901501e-01 0.0098498779
 [3,] 9.981358e-01 0.0018641935
 [4,] 9.996753e-01 0.0003246823
 [5,] 9.998598e-01 0.0001402322
 [6,] 1.303731e-02 0.9869626938
 [7,] 1.157919e-03 0.9988420815
 [8,] 6.274074e-07 0.9999993726
 [9,] 1.659164e-07 0.9999998341
[10,] 6.517362e-08 0.9999999348
[11,] 8.951474e-06 0.9999910485
[12,] 5.070740e-06 0.9999949293
[13,] 1.278186e-07 0.9999998722
[14,] 9.914646e-08 0.9999999009
[15,] 7.058751e-08 0.9999999294
[16,] 2.847667e-09 0.9999999972
[17,] 1.675766e-08 0.9999999832
[18,] 2.172290e-06 0.9999978277
[19,] 4.964820e-06 0.9999950352
[20,] 1.333680e-07 0.9999998666
[21,] 2.087793e-07 0.9999997912
[22,] 2.358360e-06 0.9999976416

The first row has equal values for variables which I would like to replace with NA. While this is simple for this particular example, I want to be able to replace all ties with NA where they occur in any size matrix i.e. in this matrix:

      1     2    3
[1,]  0.25  0.25  0.5
[2,]  0.3   0.3   0.3

all values would be replaced with NA except for [1,3]

I have looked at the function which.max.simple() which can deal with ties by replacing with NA but it doesn't appear to work any more, and all other methods of dealing with ties don't address my issue

I hope that makes sense

Thanks, C

Sotos
  • 51,121
  • 6
  • 32
  • 66
  • Trying to look for exact matches with floating point numbers doesn't always work the way you think it does. See [why aren't these numbers equal](http://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal) – MrFlick Sep 27 '16 at 14:51
  • If you replace all ties with NA and and tied value happens to be the maximum in the row, how would you address this scenario – Silence Dogood Sep 27 '16 at 14:57

1 Answers1

2

Here's a simple approach to replace any row-wise duplicated values with NA in a matrix m:

is.na(m) <- t(apply(m, 1, FUN = function(x) {
               duplicated(x) | duplicated(x, fromLast = TRUE)}))

But consider the following notes:

1) be extra careful when comparing floating point numbers for equality (see Why are these numbers not equal?);

2) depending on your ultimate target, there may be simpler ways than replacing duplicated in your data (since it seems that you are only interested in column names); and

3) if you are going to replace values in a numeric matrix, don't use arbitrary characters for replacement since that will convert your whole matrix to character class (replacement with NA is not a problem)

Community
  • 1
  • 1
talat
  • 68,970
  • 21
  • 126
  • 157