5

I have an algorithm that relies on integer inputs x, y and s.
For input checking and raising an exception for invalid arguments I have to make sure that there is a natural number n so that x*s^n=y
Or in words: How often do I have to chain-multiply x with s until I arrive at y.
And more importantly: Do I arrive at y exactly?

This problem can be further abstracted by dividing by x: x*s^n=y => s^n=y/x => s^n=z
With z=y/x. z is not an integer in general, but one can only arrive at y using integer multiplication if y is divisible by x. So this property can be easily tested first and after that it is guaranteed that z is also an integer and now it is down to solving s^n=z.

There is already a question related to that.

There are lots of solutions. Some iterative and some solve the equation using a logarithm and either truncate, round or compare with an epsilon. I am particularly interested in the solutions with logarithms. The general idea is:

def check(z,s):
    n = log(z)/log(s)
    return n == int(n)

Equality comparing floating point numbers does seem pretty sketchy though. Under normal circumstances I would not count that as a general and exact solution to the problem. Answers that suggest this approach don't mention the precision issue and answers that use an epsilon for comparing just take a randomly small number.

I wonder how robust this method (with straight equality) really is, because it seems to work pretty well and I couldn't break it with trial and error. And if it breaks down at some point, how small or large the epsilon has to be.

So basically my question is: Can the logarithm approach be guaranteed to be exact under specific circumstances? E.g. limited integer input range.

I thought about this for a long time now and I think, that it is possible that this solution is exact and robust at least under some circumstances. But I don't have a proof for that.

My line of thinking was:
Can I find a combination of x,y,s so that the chain-multiply just barely misses y, which means that n will be very close to an integer but not quite?
The answer is no. Because x, y and s are integers, the multiplication will also be an integer. So if the result just barely misses y, it has to miss by at least 1.
Well, that is how far I've gotten. My theory is, that choosing only integers makes the calculation very precise. I would consider it a method with good numerical stability. And also with a very specific behaviour regarding stability. So I believe it is possible, that this calculation is precise enough to truncate all decimals. It would be insane if someone could prove or disprove that.

If a guarantee for correctness can be given for a specific value range, I am interested in the general approach, but a fairly applicable range of values would be the positive part of int32 for the integers and double floating point precision.

Testing with an epsilon is also an option, but then the question is how small that epsilon has to be. This is probably related to the "miss by at least 1" logic.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Mario Dekena
  • 843
  • 6
  • 20
  • 3
    Do you have size limit of the variables? How fast should algorithm work to solve this problem? – KimMeo Jan 12 '23 at 00:29
  • @KimMeo There are lots of solutions in the link I mentioned. I am not particularly interested in a solution but rather in the numerical stability of the logarithm approach. This has `O(1)` complexity which makes it better than the other iterative methods (if it weren't for the precision issue). – Mario Dekena Jan 12 '23 at 00:36
  • @KimMeo There is no size limit on the variables. It would be interesting how large the integers can get until the calculation fails. I update my question to make that clearer. – Mario Dekena Jan 12 '23 at 00:39
  • 2
    Logarithms aren't very fast though. O(1), maybe. But not fast. Binary searching for the exponent in O(log bits) is also O(1) for integers of limited size, and only involves efficient operations. – harold Jan 12 '23 at 00:49
  • @harold How can binary search be O(1) on average? – Mario Dekena Jan 12 '23 at 00:54
  • 1
    @MarioDekena because if `bits` is a constant (which by assumption it is, since I said "for integers of limited size"), then `log bits` is also a constant. That of course does not apply to general integers of unlimited size. – harold Jan 12 '23 at 01:09
  • @MarioDekena: It can't be in general. Neither can logarithms. But once you restrict the problem domain to, say 64-bit integers, then binary search takes at most 64 steps, and since 64 is a constant it's O(1), just like logarithms of 64-bit values. – President James K. Polk Jan 12 '23 at 01:09
  • @harold I see. I am aware that logarithms are calculated iteratively and it depends on how you view the problem. I might have stressed the "logarithms" are better or faster too much. I am really only interested in the correctness, not performance. – Mario Dekena Jan 12 '23 at 01:14
  • 1
    On my system, `check(243, 3)` returns `False` when it should return `True`. I think that's the real problem that you need to look out for. – user3386109 Jan 12 '23 at 02:44
  • @MarioDekena With `x*s^n=y`, are the values all 0 or more? If not, what range? – chux - Reinstate Monica Jan 12 '23 at 05:34
  • what is the problem with just verifying the result ? `return (x*pow(s,n)==y);` Of coarse I would use power by squaring integer implementation of `pow` ... however you still did not clarify if the ints are limited and if yes how to handle overflow? you know if you got 32bit int then all the operations are `mod 2^32` which means there might be solutions that are not possible in unlimited integer range. – Spektre Jan 12 '23 at 09:37
  • @Spektre I don't see how verifying the result would help here. You either have to round n to an int (if you want to use integer pow) or you would have to round the result afterward. Either way, you round somewhere in order to compare with y. How would that help here? I rule out overflow because it is a calculation error and it is not relevant to the problem from a mathematical standpoint. I was interested in the general accuracy of the approach, so while in practice there are constraints, you can argue about correctness given specific inputs without knowing the exact numerical ranges. – Mario Dekena Jan 12 '23 at 11:05
  • @MarioDekena that would eliminate the problem of false positives when rounded result is near solution but not quite it ... – Spektre Jan 12 '23 at 12:18
  • 1
    @MarioDekena: Unless `log` has a huge error, not just ordinary floating-point issues in a low-effort implementation of `log` but gross bugs, then `round(log(z)/log(s)` will be exactly the mathematical log(z)/log(s) if z is a power of s. So, if a value for `n` such that s^n=z exists, then `round(log(z)/log(s))` will find it, and then `ipow(s, n) == z` will yield true if it was found and false if it was not. (And, if it does not exist, it will of course not be found.) – Eric Postpischil Jan 12 '23 at 12:48
  • @EricPostpischil Ah that is what Spektre was up to as well. That is actually a really good solution. If rounding can not yield false negatives that is. – Mario Dekena Jan 12 '23 at 16:45

1 Answers1

8

You’re right to be skeptical of floating point. Math libraries typically don’t provide correctly rounded transcendental functions (The Table Maker’s Dilemma), so the exact test is suspect. Indeed, it’s not difficult to find counterexamples (see the Python below).

Since the input z is an integer, however, we can do an error analysis to determine an appropriate epsilon. Using calculus, one can prove a bound

log(z+1) − log(z) = log(1 + 1/z) ≥ 1/z − 1/(2z2).

If log(z)/log(s) is not an integer, then z must be at least one away from a power of s, putting this bound in play. If 2 ≥ z, s < 231 (have representations as signed 32-bit integers), then log(z)/log(s) is at least (1/231 − 1/263)/log(231) away from integer. An epsilon of 1.0e-12 is comfortably less than this, yet large enough that if we lose a couple of ulps (1 ulp is on the order of 3.6e-15 in the worst case here) to rounding, we don’t get a false negative, even with a rather poor quality implementation of log.

import math
import random

while True:
    x = random.randrange(2, 2**15)
    if math.log(x**2) / math.log(x) != 2:
        print("#", x)
        break
# 19143
David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • 1
    Thank you for your answer. Indeed there seem to be counterexamples. Also even more interesting, 19143 works on my machine, but 9170 does not. That means, not only is it not reliable, but it is also machine dependent. I like your reasoning about a bound for the epsilon. I am somehow sad that the exact comparison is unreliable (would've been cool if it would've been reliable under some circumstances at least) but having a sufficiently small epsilon with a proven bound is also not too bad. – Mario Dekena Jan 12 '23 at 11:20
  • Note that according to https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html they are SUPPOSED to only have a maximum error of 0.5 ulps. That said, after the 2 logs are off from the real answer by 0.5 ulps, the ratio can (even rounded correctly) now be off by an ulp. Giving an incorrect answer. As in the example of `check(243, 3)`, – btilly Jan 12 '23 at 17:27
  • 1
    @btilly 0.5 ulps is correctly rounded, which AIUI is typically not done because the Table Maker's Dilemma forces large memory usage in some cases (see the crlibm writeup). A good log will guarantee 1 ulp. – David Eisenstat Jan 12 '23 at 19:24
  • 1
    Oops, I thought there was a comment that log can be correctly calculated to 0.5 ulp. But there isn't. You're right. Incidentally https://www.johndcook.com/blog/2019/11/16/the-hardest-logarithm/ shows the hardest log to calculate. It needs more than 118 bits correctly calculated to know how to round the 53rd bit. – btilly Jan 12 '23 at 19:38