3

I was doing leetcode when I had to do some arithmetic with rational numbers (both numerator and denominator integers).

I needed to count slopes in a list. In python

collections.Counter( [ x/y if y != 0 else "inf" for (x,y) in points ] )

did the job, and I passed all the tests with it. ((edit: they've pointed out in the comments that in that exercise numbers were much smaller, not general 32 bit integers))

I wonder if this is correct, that is, python correctly recognizes if a/b == c/d as rationals, for a,b,c,d 32 bit integers. I am also interested the case for c++, and any additional facts that may be useful (footguns, best practices, theory behind it if not too long etc).

Also this question seems frequent and useful, but I don't really find anything about it (give me the duplicates!), maybe I am missing some important keywords?

bmacho
  • 71
  • 4
  • In Python, if `a` and `b` are `int`, `a/b` is a `float`. If you want arbitrary precision arithmetic, you should use the `fractions` library. – jthulhu Aug 17 '22 at 09:13
  • Use the [`decimal`](https://docs.python.org/3/library/decimal.html) module. – Jan Christoph Terasa Aug 17 '22 at 09:14
  • Regarding c++: There are no c++ `rational`s (at least not built in). If `a`, `b` are `int`s, `a/b` is an expression with an `int` type (the result of the integer division). – wohlstad Aug 17 '22 at 09:16
  • 6
    check [`a*d == b*c` instead](https://stackoverflow.com/q/23177522/995714). [Convert Fraction to decimal number and sort them](https://stackoverflow.com/q/62549344/995714) – phuclv Aug 17 '22 at 09:22
  • @phuclv This is indeed safer, accurate and faster, but if the numbers are big enough, the product may overflow while the division can't. – Fareanor Aug 17 '22 at 09:31
  • 2
    @Fareanor: Python ints will not overflow. –  Aug 17 '22 at 09:34
  • @YvesDaoust Oh right, I was talking for C++ – Fareanor Aug 17 '22 at 09:36
  • If, for any reason, your fractions occur naturally in their irreducible form (or some magic achieves that with low cost), it suffices to compare the numerators and denominators. –  Aug 17 '22 at 09:36
  • @Fareanor: in C++, the ints can be promoted to 64 bits to avoid overflow. –  Aug 17 '22 at 09:37
  • Which LeetCode problem was that? – Kelly Bundy Aug 17 '22 at 09:43
  • @YvesDaoust They won't be implicitly promoted to larger (64 bits here) type though so the overflow will occur unless the cast is explicitly performed before the operation. Of course, all of this assumes that there is a larger integer type available for the target platform. – Fareanor Aug 17 '22 at 09:49
  • @Fareanor: unsure which language you are referring to. If Python, no overflow. If C++, as I said, promote to 64 bits (it's been a very long time since 64 bit arithmetic was emulated on 32 bits machines). –  Aug 17 '22 at 09:55
  • @KellyBundy https://leetcode.com/problems/max-points-on-a-line/ but according to your answer i probably was just *very* lucky – bmacho Aug 17 '22 at 09:58
  • @phuclv better (not faster though) to normalize fractions using `std::gcd` – Goswin von Brederlow Aug 17 '22 at 10:35
  • @YvesDaoust As I already said, I'm talking about C++. And as I already said again, promotion to larger than `int` is not implicit, so it will overflow if the values are big enough. The only way to have such a promotion is to cast (explicitly) the `int` operands into `long long int` before executing the product. Integer promotion applies for integral numbers shorter than `int`, this is all what I said. [See here](https://godbolt.org/z/89boG4Pbo) – Fareanor Aug 17 '22 at 10:59
  • @Fareanor: I don't see why you need to repeat what I say. –  Aug 17 '22 at 11:02
  • @YvesDaoust I didn't repeat at all. You seem to think that promotion to `long long` would magically happen, it is not the case in C++. Multiply two `int`s, you'll get an `int`. [If you want a proof](https://godbolt.org/z/89boG4Pbo). – Fareanor Aug 17 '22 at 11:05
  • @Fareanor: not at all. –  Aug 17 '22 at 11:06
  • @bmacho Ah, maybe they reduced the limits in the meantime. With the current range [-10^4,10^4] (far smaller than the 32 bit range you asked about) and the pretty ubiquitous 64 bit floats, I think it's safe. – Kelly Bundy Aug 17 '22 at 11:28
  • @bmacho Looks like they indeed reduced the limits, in old discussions you can still see people talking about "[The test case #35 with input {0, 0}, {94911151, 94911150}, {94911152, 94911151}](https://leetcode.com/problems/max-points-on-a-line/discuss/287079/Wrong-test-case/281152/)". – Kelly Bundy Aug 17 '22 at 11:56
  • 2
    It turns out that this question is a duplicate of https://stackoverflow.com/q/29594813/270986. (Though I only discovered that by searching for the magic constant `67114657`.) – Mark Dickinson Aug 18 '22 at 18:46

3 Answers3

6

Assuming you don't want to allow for the effects of integer division, check the equivalent ad == bc instead.

This is more numerically stable. In C++ you can write

1LL * a * d == 1LL * b * c

to prevent overflow.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • A cast would be neater I guess. –  Aug 17 '22 at 09:40
  • 1
    But how do you use that for the `Counter` there? – Kelly Bundy Aug 17 '22 at 09:40
  • @YvesDaoust: I can't stand casts in numerical expressions. Perhaps I'm too old for this. – Bathsheba Aug 17 '22 at 09:40
  • Even though the first product can be optimized away, this is a not-so-readable hack and you need to trust that evaluation is left-to-right. –  Aug 17 '22 at 09:59
  • @YvesDaoust: Which of course it is. We will have to agree to disagree but all the numerical codebases I've ever worked on all use this (as opposed so something like a `static_cast`) to change argument types. `1.0 *` is pretty idiomatic where I come from. Of course you can get problems if someone writes `a / b * 1.0`, but they only ever do that once! – Bathsheba Aug 17 '22 at 10:03
  • @Bathsheba: I feel sorry for you. –  Aug 17 '22 at 10:03
  • @YvesDaoust ;-) – Bathsheba Aug 17 '22 at 10:04
  • More seriously, this problem arises due to a type change, which can be the sign of a design flaw. Often the problem leading to the type change lies further back. – Bathsheba Aug 17 '22 at 10:05
  • Which problem ? –  Aug 17 '22 at 10:27
5

It's not safe, and I've seen at least one LeetCode problem where you'd fail with that (maybe Max Points on a Line). Example:

a = 94911150
b = 94911151
c = 94911151
d = 94911152
print(a/b == c/d)
print(a/b)
print(c/d)

Both a/b and c/d are the same float value even though the slopes actually differ (Try it online!):

True
0.9999999894638303
0.9999999894638303

You could use fractions.Fraction(x, y) or the tuple (x//g, y//g) after g = math.gcd(d, y) ( if I remember correctly, this is more lightweight/efficient than the Fraction class).

Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
  • (Note that before the "imprecision" police get to this, floating point `/` is required by IEEE754 to give the best answer possible.) – Bathsheba Aug 17 '22 at 09:43
  • @Bathsheba: This isn't floating-point `/`, though: this is Python's integer-by-integer true division algorithm. In CPython, and assuming IEEE 754 floating-point, this is carefully engineered to give correctly-rounded results, but that doesn't automatically fall out of IEEE 754 - it takes some effort (especially for operands larger than 2**53). – Mark Dickinson Aug 17 '22 at 10:06
  • 1
    @MarkDickinson Yes, it might be safe with all numbers smaller than my example. I remembered using three consecutive numbers like that leads to small examples, as mathematically the difference between those two fractions is 1/((a+1)(a+2)), so that that 1 gets lost because it's so small compared to (a+1)(a+2). I found the example by [exhaustive search](https://tio.run/##K6gsycjPM7YoKPr/Py2/SCFRITNPoSgxLz1Vw0hLy8hc04pLAQiSFGwVkoE4UUFbwRAskgLlGYF5icXFqUUlCon6SQqKQJX6KToKif//AwA) from 0 upwards, so at least all smaller cases with such consecutive numbers seem safe (on typical hard/software). – Kelly Bundy Aug 17 '22 at 11:43
  • @MarkDickinson How did you find that? And can you rule out smaller cases? – Kelly Bundy Aug 17 '22 at 18:09
  • @MarkDickinson LGTM. Now how to find the smallest case... I'm afraid with exhaustive search I don't have the patience to wait for Python and don't have the patience to write in any other language :-) – Kelly Bundy Aug 17 '22 at 18:45
  • @MarkDickinson Would like to see the code. Would probably also make a good answer, together with your proof. – Kelly Bundy Aug 17 '22 at 19:14
  • @MarkDickinson Hmm, why did you delete your answer? (I still didn't finish reading it as I've been lacking the willpower for nontrivial things, but it looked good). – Kelly Bundy Aug 18 '22 at 12:38
  • @KellyBundy: I wasn't happy with the proof, so I took it down until I had time to rework it properly. I've now done that. – Mark Dickinson Aug 18 '22 at 16:45
  • @MarkDickinson Looks even more intimidating now :-). I'll have a look later. Is `-67114658 / -67114657` a positive fraction? (I couldn't find a definition of the term "positive fraction", but I think it's both positive and a fraction.) – Kelly Bundy Aug 18 '22 at 17:13
  • @KellyBundy: I'll clarify that I'm assuming all of `a`, `b`, `c` and `d` positive. – Mark Dickinson Aug 18 '22 at 17:30
  • And yes, "positive fraction" means "positive fraction": i.e., a rational number which is positive. – Mark Dickinson Aug 18 '22 at 17:57
2

tl;dr: If max(|a|, |b|, |c|, |d|) ≤ 67114657, then you're safe: under that restriction, if a/b and c/d are equal as IEEE 754 binary64 floats, then they're equal as fractions.

In detail, we have the following theorem, giving a precise bound under which the mapping from fractions to IEEE 754 double-precision binary floats is injective.

Theorem. Suppose that a/b and c/d are unequal fractions such that when both are converted to the nearest IEEE 754 binary64 float they become equal. Then max(|a|, |b|, |c|, |d|) > 67114657.

Note that our bound 67114657 is just a little larger than 2**26 = 67108864. Below we give a direct mathematical proof that max(|a|, |b|, |c|, |d|) ≥ 67108864, and then augment that with an exhaustive search to show that the smallest case where distinct fractions a/b and c/d coincide as floats has max(|a|, |b|, |c|, |d|) = 67114658.

By symmetry, it's enough to consider positive fractions. (If both a/b and c/d are negative, apply the theorem to -a/b and -c/d. If the signs of a/b and c/d differ or either fraction is zero, it's easy to establish that absent underflow or overflow, they can't both map to the same float. And the only way for underflow or overflow to be involved is when max(|a|, |b|, |c|, |d|) is huge (at least 2**1022).) So from this point on we assume that a, b, c and d are all positive.

The proof of the theorem is divided into two main cases, with the first case (which is the more interesting one) further subdivided. (Spoiler: Case 1d is the only really interesting one, and it's the one where we need to perform the exhaustive search.)

Case 1: a/b and c/d live in the same "binade"

The main case we consider is the case in which there's a closed interval of the form [2**e, 2**(e+1)] for some integer e that contains both of a/b and c/d. Within that interval, consecutive floats are spaced at distance 2**(e-52) from one another, so if a/b and c/d map to the same float then |a/b - c/d| ≤ 2**(e-52). Rearranging, we know that

2**(52-e) ≤ b*d / |a*d - b*c|.

Note that since a/b and c/d are distinct, |a*d - b*c| ≥ 1.

Now we divide case 1 into 4 subcases.

Case 1a: e ≥ 1

In this case, 2**e ≤ a/b and 2**e ≤ c/d imply that b ≤ 2**-e * a and d ≤ 2**-e * c, hence that b*d ≤ 2**(-2*e) * a*c. So from the inequality highlighted above,

2**(-2*e) * a*c ≥ b*d ≥ b*d / |a*d - b*c| ≥ 2**(52-e)

Simplifying gives a*c ≥ 2**(52 + e) ≥ 2**53. So at least one of a or c must be at least √(2**53), so max(a, b, c, d) ≥ √(2**53) > 67114657 and we're done.

Case 1b: e ≤ -1

In this case,

b*d ≥ b*d / |a*d - b*c| ≥ 2**(52-e) ≥ 2**53

so now either b or d (or both) is larger than √(2**53), and again we're done.

Case 1c: e = 0, |a*d - b*c| ≥ 2

In this case, our first inequality above gives:

b*d / 2 ≥ b*d / |a*d - b*c| ≥ 2**(52-e) = 2**52

So b*d ≥ 2**53, and as with cases 1a and 1b, we're done.

Case 1d: e = 0, |a*d - b*c| = 1

This one's the interesting case. Now our main inequality gives

b*d = b*d / |a*d - b*c| ≥ 2**(52-e) = 2**52

So at least one of b and d is larger than 2**26 = 67108864, so we have max(a, b, c, d) ≥ 67108864. But we need a bit more: we need max(a, b, c, d) > 67114657.

At this point we can do an exhaustive search. Before diving into that, we need a bit of work to reduce the search space to something feasible.

First note that either a < c or c < a: they can't be equal (except in trivial cases), since ad - bc = ±1 is not divisible by a (unless a = 1). Now if a < c then b < d, and if c < a then d < b. Let's swap if necessary so that a/b is the fraction with larger numerator and denominator: c < a and d < b.

Now for a tiny bit of elementary number theory: given a positive fraction a/b (written in lowest terms), there's a unique pair c and d of integers such that a*d - b*c = 1, 0 ≤ c < a and 0 < d ≤ b, and a unique pair c and d of integers such that a*d - b*c = -1, 0 < c ≤ a and 0 ≤ d < b. So given a/b, there are at most two possible choices for c/d, and we can find both those choices using the extended Euclidean algorithm. (For mathematicians: there's a hint of the theory of continued fractions here: the two choices for c/d are the parents of a/b in the Stern-Brocot tree.)

For further reductions: in this case we know that b*d ≥ 2**52, and that b > d, so we have that b ≥ 2**26. Moreover, since e = 0 we have 1 ≤ a/b ≤ 2, so b ≤ a. And since 0 < c < a and 0 < d < b, max(a, b, c, d) = a.

So we can restrict ourselves to searching pairs (a, b) of relatively prime integers satisfying 2**26 ≤ b ≤ a, then for each of those pairs generate the two possibilities for c/d using the extended Euclidean algorithm. Here's some Python code that does exactly that:

from math import gcd


def sb_parents(m, n):
    """
    Given a positive fraction m/n, return its parents in the Stern-Brocot tree.
    """
    a, b, p, q, r, s = n, m % n, 1, 0, m // n, 1
    while b:
        x = a//b
        a, b, p, q, r, s = b, a - x * b, r, s, p + x * r, q + x * s
    return p, q, r - p, s - q


for a in range(2**26, 2**27):
    for b in range(2**26, a):
        if gcd(a, b) > 1:
            continue
        c, d, e, f = sb_parents(a, b)
        if d and a/b == c/d:
            print(f"{a}/{b} == {c}/{d}")
        if f and a/b == e/f:
            print(f"{a}/{b} == {e}/{f}")

When run, the first example this prints (after around 30 seconds of runtime on my laptop) is

67114658/67114657 == 67114657/67114656

The next few, which take a few minutes to produce, are:

67118899/67118898 == 67118898/67118897
67121819/67121818 == 67121818/67121817
67123403/67115730 == 67114655/67106983
67124193/67124192 == 67124192/67124191
67125383/67119501 == 67113971/67108090
67126017/67122029 == 67109185/67105198
67126246/67126245 == 67126245/67126244
67128080/67128079 == 67128079/67128078

This completes the proof in case 1d, which in turn completes the proof of all of case 1 (since the four cases exhaust all possibilities). We're pretty much done, except that we still have the annoying second case to eliminate. We do that now.

Case 2: a/b and c/d do not live in the same "binade"

This is the negation of case 1: we assume that there's no integer e so that both a/b and c/d lie in the closed interval [2**e, 2**(e+1)]. The only way that this can happen is if there's a power of two lying strictly between a/b and c/d: swapping a/b and c/d if necessary, there's an integer e with

a/b < 2**e < c/d

Now since we're assuming that a/b and c/d map to the same binary64 float, 2**e, being squeezed between a/b and c/d, must also map to that same binary float, and that float will be exactly equal to 2**e (since powers of two within a reasonable range will convert exactly).

At this point we can discard c/d and just consider a/b. Since a/b and 2**e map to the same float, it follows that the difference between a/b and 2**e is at most half a ulp (because 2**e converts exactly), so 0 < 2**e - a/b ≤ 2**(e-54), or 0 < 2**e * b - a ≤ b*2**(e-54). Moreover, since a/b is so close to 2**e, we have 2**(e-1) < a/b, so b < a * 2**(1-e).

Now consider the case where e ≥ 0. Then 2**e * b - a is an integer, so 1 ≤ b*2**(e-54). Combining with b ≤ a * 2**(1-e) we get 2**53 ≤ a, so a must be huge, and correspondingly max(a, b, c, d) is at least 2**53.

Finally, consider the case where e ≤ 0. Then from 0 < 2**e - a/b ≤ 2**(e-54) we have 0 < b - 2**(-e)*a ≤ b*2**(-54). As before, b - 2**(-e)*a is a positive integer, so 1 ≤ b * 2**-54, so 2**54 ≤ b, and again we're done.

Mark Dickinson
  • 29,088
  • 9
  • 83
  • 120