0

I need a sort function that treats numbers that would be equal using all.equal() as if they are equal.

For instance, if you do:

library(plyr)
a = sample(c(0.8, 0.7), 30, replace=TRUE)
b = sample(c(1.1, 1.2), 30, replace=TRUE)
df = data.frame(a)
df$b = b
df$sum = a + b
arrange(df, desc(sum))

All pairs of (0.8, 1.1) will sort above pairs of (0.7, 1.2), which is not what I want--I want the random order to be preserved within the category of things that sum to 1.9.

This is happening because

> 1.1 + 0.8 > 1.2 + 0.7
[1] TRUE

and

> 1.1 + 0.8 == 1.2 + 0.7
[1] FALSE

I understand that this is a consequence of how floating point numbers work, and that R has a function all.equal() to test for "true" equality. For example

> all.equal(0.8+1.1, 0.7+1.2)
[1] TRUE

So I'm looking for a sort function or a way to sort that behaves as all.equal() does and not as == does.

Edited to make clear this is not a duplicate of other questions.

Ben S.
  • 3,415
  • 7
  • 22
  • 43
  • Add a small random noise to all values with a standard deviation a couple of magnitudes smaller than what you see in your list of numbers? That should produces random orders within groups – ekstroem Jul 16 '17 at 17:18
  • Do not use comparison operators with floating point numbers, the results will not be dependable. – Pierre L Jul 16 '17 at 17:20
  • @ekstroem That could work, wouldn't be my first choice, but I will resort to it (get it?) if I must. – Ben S. Jul 16 '17 at 17:39
  • @PierreLafortune please remove the exact duplicate tag, I rewrote the question – Ben S. Jul 16 '17 at 17:40
  • Reopened. But this has nothing to do with floating point now, remove the decimals using `8+11` and `7+12` and you have the same issue. The answer is to arrange by two columns `arrange(df, desc(sum), a)` – Pierre L Jul 16 '17 at 17:48
  • Hi Pierre, not sure what you mean. If I take the decimals out of that code and run it, I see that the values that sum to a 19 are a mix of 8+11 and 7+12 (i.e. the random order is preserved), so the issue does indeed go away (try it yourself). Anyway thanks for removing tag. – Ben S. Jul 16 '17 at 17:52
  • 1
    Round the sum `df$sum = round(a + b, 5)` and it will behave the way you are expecting. – Pierre L Jul 16 '17 at 18:44
  • I think that using `signif` rather than `round` would be better. But I approve the previous suggestion. – F. Privé Jul 17 '17 at 11:18
  • round displaces the problem at midpoint boundary, for example `9.001+0.004` and `9.002+0.003` lie at different sides of `9005/1000`. But if you are sure that there is no input close to such boundary (like all your inputs are with at most 4 decimal places after fraction point, and you round at fifth decimal place) then it's a great solution. – aka.nice Jul 17 '17 at 13:49

0 Answers0