Efficiently detect that rational numbers are equal

Question

I have a collection of many rational numbers, with the numerator and denominator of each stored as a large (hundreds or thousands of bits) unsigned integer. I'd like to be able to efficiently test whether any given rational number a/b in the set is equal to any other rational number c/d in the set.

The most straightforward method would be to test whether a*d == b*c, of course, but I'd like something more efficient than computing the full products.

Some notes on my particular use-case:

The pairs I'll be testing have a high likelihood of actually being equal (because I'm already precomputing and comparing them by their floating point approximations first), so early-outing if they're unequal won't save me much time.
I'm fine with precomputing extra data for each of the numbers, but each number will only be used in a handful of comparisons, so expensive precomputation (such as prime factorization) probably wouldn't be worthwhile.
Occasional false negatives would be fine, but false positives are not.

I think this may be theoretically impossible, but throwing it out to the hive mind just in case.

Common approach is to normalize a/b as (a/gcd(a,b)) / (b/gcd(a,b)). Why it doesn't work for you? — , Nov 24 '16 at 00:27
@deniss Because computing the GCD of two large numbers is a fairly expensive operation. — Sneftel, Nov 24 '16 at 08:13
What is your data size and speed requirements? GMP has [some nice optimized algorithms](https://gmplib.org/manual/Subquadratic-GCD.html#Subquadratic-GCD) for GCD — , Nov 24 '16 at 16:12
@deniss Data size is up to the thousands of bits, and when we tried reducing the rationals up-front, performance dropped. — Sneftel, Nov 24 '16 at 18:38
You can convert a digital convolution (which is what a bigint multiplication is) to pointwise multiplication using the FFT, and then compare the two products after doing the IFFT. But I don't know where the cross-over point for naive multiplication to FFT multiplication will be for you. — Iwillnotexist Idonotexist, Nov 28 '16 at 17:52
@IwillnotexistIdonotexist My multiplication routine is already optimized to within an inch of its life... looking for something more efficient than carrying out the full multiplication. — Sneftel, Nov 28 '16 at 18:30

score 1 · Answer 1 · answered Nov 28 '16 at 17:49

1

You can filter out many not equal pairs of fractions by comparing the bit-lengths. Let l(a) = floor(log2(a)) + 1 the bit-length of a. If a/b = c/d than l(a) + l(d) = l(c) + l(b).

You can use this for a speedup when you are first comparing the lengths and comparing the products, only if the sum of the lengths are equal.

answered Nov 28 '16 at 17:49

clemens

16,716
11
50
65

As I mentioned, there's a high probability of matching because I'm already doing an approximate check, so early-outing like this doesn't save any additional time. – Sneftel Nov 28 '16 at 18:07
2

But computing the approximation should be much slower than comparing the bit lengths. – clemens Dec 02 '16 at 06:49

clemens · Answer 2 · 2016-11-29T06:00:33.703

1

Second try ;) If you must check new numbers repeatedly for set containment, you should store the relatively prime fractions in an ordered set. The comparison function of the set should compare first the counters, and than the denominators if the counters are equal. The comparison can be done in linear time, and thus finding an element in the ordered set with M items needs O(N log M) steps. Reducing a fraction costs O(N ²). Thus, testing a number for containment requires O(N ² + N log M) steps, and computing the set O(MN ²).

Edit: Instead of using a sorted or a tree set, you can use a hash set which reduces the required number of steps for searching to O(N ² + N) = O(N ²).

edited Nov 29 '16 at 06:00

answered Nov 28 '16 at 20:16

clemens

16,716
11
50
65

As I mentioned, reducing the fractions lowered performance due to the upfront GCD cost (well, actually extended Euclidean, but same diff). – Sneftel Nov 30 '16 at 21:18
For the upfront reductions you need no _extended_ GCD. The more simple computation doesn't reduce the complexity, but will reduce the real running time. I would do the pre-computations without approximations, because I think the reduction will give you more flexibility, and without reduction you can not find an efficient algorithm for your problem. – clemens Dec 02 '16 at 06:43

score 0 · Answer 3 · answered Dec 01 '16 at 00:05

This is not going to be very helpful in your case, if you already precompute the floating point approximation; it might still save some time in the pipeline (or some approximations).

You examine the integer values of a, b, c and d.

The rational numbers being the same means they describe the same line through the origin.

If c > a, then it must also be that d > b, otherwise we would be in the gray area in the bottom right; if c < a, conversely, it must be that d < b, or we would be in the upper left gray corner. The gray areas have no possibilities of equality, and if the numbers were random (i.e. not float-approximation-filtered), we would have excluded N/2 of them with N² bigint comparisons.

Of the remaining 50%, we can exclude half by noticing that if a > b, then the black line is below the bisector of the first and third quadrants, and it must be that c > d, otherwise C/D would be on the other side of the bisector; we would be in the top orange sector with no equality possible. The same or the a < b case. So other two bigint comparisons reduce the numbers to be checked to one quarter of the original; those can be approximated in floating point, and if they're "almost equal", we are in the tiny red zone where other techniques are needed.

You can also extend this method by observing that for any k, the relation of a to kb must be the same as the one between c and kd for a/b and c/d to be equal; and if k is an integer power of 2, that allows several possible optimizations.

(At some point, of course, the cost of this will exceed that of a*d==b*c test).

Efficiently detect that rational numbers are equal

3 Answers3