1

I'm noticing some very odd floating point arithmetic behavior when I use the modulo operator. If I evaluate 58 mod 2, I can (and should) get 0.

> 58 %% 2
[1] 0

No surprise there. But what if "58" is the result of some floating point arithmetic operation? The following produces the nonsensical result of 2.

> (0.58*100) %% 2
[1] 2

I'm not sure why it's ever OK for anything mod 2 to produce 2, but I thought this might be the result of trying to mod a non-integer. So I tried coercing.

> as.integer(0.58*100) %% 2
[1] 1

This appears to make sense when I evaluate the ceiling of the argument of as.integer.

ceiling(as.integer(0.58*100))
[1] 57

This doesn't make any sense, but what's worse is that it's not even consistent.

> ceiling(as.integer(0.58*100))
[1] 57
> ceiling(as.integer(0.57*100))
[1] 56
> ceiling(as.integer(0.56*100))
[1] 56
> ceiling(as.integer(0.55*100))
[1] 55

Finally, it would appear that numerics generated by a sequence behave differently based on the length of the sequence!

> sapply(seq(from=0.55, to=0.58, by = 0.01), function(x) 
ceiling(as.integer(100*x)))
[1] 55 56 57 57
> sapply(seq(from=0.55, to=0.59, by = 0.01), function(x)                 
ceiling(as.integer(100*x)))
[1] 55 56 57 58 59

Is there a way for me to multiply two numerics, and coerce the result to an integer whose ceiling is equal to the integer value I get from as.integer?

Dr. Arun
  • 176
  • 6
  • `ceiling(as.integer(0.57*100 + 0.00000000001))` is a hack to achieve this. – Adder Apr 09 '18 at 14:07
  • Related: [*Round up from .5*](https://stackoverflow.com/q/12688717/2204410) – Jaap Apr 09 '18 at 14:08
  • Is there a reason you're not just using `round()`? (Also `ceiling` and `as.integer` are vectorized, you don't need `sapply`.) – Gregor Thomas Apr 09 '18 at 14:13
  • And what is the reason behind all the `ceiling(as.integer())`? As soon as something is an integer, it is equal to its ceiling. `ceiling(as.integer(x))` will always be the same as `as.integer(x)`. Maybe you want to be doing the `ceiling` *inside* the `as.integer`? But again, I don't understand why you wouldn't use `round()`. I think `as.integer(round(x))` will give you nice intuitive results. – Gregor Thomas Apr 09 '18 at 14:14
  • 2
    From `?"%%"` : *%% and x %/% y can be used for non-integer y, e.g. 1 %/% 0.2, but the results are subject to representation error and so may be platform-dependent.* – G. Grothendieck Apr 09 '18 at 14:23
  • @Gregor Good point re. round(), and in fact that's what I did in my particular use case. However, there are other cases where I might want to use floor() or ceiling(), and so I wanted to get an understanding of why R is doing what it's doing here. – Dr. Arun Apr 09 '18 at 15:42
  • @Gregor Or maybe just `ceiling`: `ceiling(0.58*100) %% 2` is equal to `0`. – Rui Barradas Apr 09 '18 at 15:42
  • As documented in `?as.integer`, `as.integer()` truncates toward 0. All your `ceiling()` calls do nothing, because the inputs have already been truncated to an integer. That's all that is going on here. – Gregor Thomas Apr 09 '18 at 15:54
  • 1
    @RuiBarradas "*Or maybe just ceiling: ceiling(0.58*100) %% 2 is equal to 0*" That's a terrible idea!!!! `ceiling(0.58 * 100)` could very well be 59 depending on the floating point imprecision. No no no! `round` is generalizable, `ceiling()` happens to work in this one case. – Gregor Thomas Apr 09 '18 at 16:00
  • @Gregor Right, I was too much in a hurry to think. – Rui Barradas Apr 09 '18 at 16:55

2 Answers2

5

I do not know R, so the following is based on knowledge of floating-point and software generally.

You say (0.58*100) %% 2 produces 2. It does not. It produces a value slightly under 2, but R’s default formatting rounds it for display. First, .58 is not exactly representable in the floating-pont format R uses. Presuming it is IEEE-754 basic 64-bit binary floating-point and R converts with correct rounding. then 0.58 produces:

0.57999999999999996003197111349436454474925994873046875,

and 0.58*100 produces:

57.99999999999999289457264239899814128875732421875,

and (0.58*100) %% 2 produces:

1.99999999999999289457264239899814128875732421875.

When you use the residue operation with floating-point arithmetic, you ought to be prepared to accept results such as this. In terms of arithmetic modulo 2, 1.99999999999999289457264239899814128875732421875 is very close to 0. If they are not close for your purposes, then residue and floating-point arithmetic may not be suitable for your application.

You say it does not make sense that ceiling(as.integer(0.58*100)) produces 57, but we see above the reason for it: 0.58*100 is 57.99999999999999289457264239899814128875732421875, so truncating it to an integer produces 57.

Next you have you say these are not consistent:

> ceiling(as.integer(0.58*100))
[1] 57
> ceiling(as.integer(0.57*100))
[1] 56
> ceiling(as.integer(0.56*100))
[1] 56
> ceiling(as.integer(0.55*100))
[1] 55

The rule consistently used here is that each operation produces the exact mathematical result rounded to the nearest representable value.

Thus:

0.58 → 0.57999999999999996003197111349436454474925994873046875
0.58*100 → 57.99999999999999289457264239899814128875732421875
0.57 → 0.56999999999999995115018691649311222136020660400390625
0.57*100 → 56.99999999999999289457264239899814128875732421875
0.56 → 0.560000000000000053290705182007513940334320068359375
0.56*100 → 56.00000000000000710542735760100185871124267578125
0.55 → 0.5500000000000000444089209850062616169452667236328125
0.55*100 → 55.00000000000000710542735760100185871124267578125

Regarding this:

> sapply(seq(from=0.55, to=0.58, by = 0.01), function(x) 
ceiling(as.integer(100*x)))
[1] 55 56 57 57
> sapply(seq(from=0.55, to=0.59, by = 0.01), function(x)                 
ceiling(as.integer(100*x)))
[1] 55 56 57 58 59

I suspect what is happening is that R is not calculating the loop index iteratively (starting with .55 and adding .01 each time) but independent calculating the value for each element in the sequence using some formula. (This would be necessary to create a parallelizable algorithm for evaluating sequences.) A common way to deal with this is to use integers for the loop parameters and then scale the value as desired, as with:

> sapply(seq(from=55, to=59, by=1), function(x)
ceiling(as.integer(100*(.01*x))))

Floating-point arithmetic approximates real arithmetic. When used with continuous functions, slight arithmetic errors change the results proportionately (in a manner of speaking) in the results—a change in input or evaluation moves the result along the continuous function. When used with discontinuous functions, slight arithmetic errors may move the results across discontinuities, resulting in jumps. Hence, with arithmetic modulo 2, a slightly change from 1.9999… to 2 in input results in a change from 2 to 0 in output. If you want to use floating-point arithmetic with discontinuous functions, you should understand floating-point arithmetic and its behaviors.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
4

If the values are rounded first consistent results are obtained in all these examples. Also note my comment below the question. It would probably be safer to use 55:58 / 100 and 55:59/100 in the last two examples.

(round(58)) %% 2
## [1] 0
(round(0.58*100)) %% 2
## [1] 0
(round(0.58*100)) %% 2
## [1] 0
ceiling((round(0.58*100)))
## [1] 58
ceiling((round(.58*100)))
## [1] 58
ceiling((round(0.57*100)))
## [1] 57
ceiling((round(0.56*100)))
## [1] 56
ceiling((round(0.55*100)))
## [1] 55
sapply(seq(from=0.55, to=0.58, by = 0.01), function(x) ceiling((round(100*x))))
## [1] 55 56 57 58
sapply(seq(from=0.55, to=0.59, by = 0.01), function(x) ceiling((round(100*x))))
## [1] 55 56 57 58 59
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341