Improve the efficiency of this search to check if any two numbers in this list sum to another?

Question

I'm trying to find the most efficient way to check if any two numbers in this list sum to another one in the list using Python. I have decided to add some context to make this more clear and possibly easier to optimize. Here is my code:

import numpy as np
from collections import Counter
from collections import deque


def gen_prim_pyth_trips(limit=None):
    u = np.mat(' 1  2  2; -2 -1 -2; 2 2 3')
    a = np.mat(' 1  2  2;  2  1  2; 2 2 3')
    d = np.mat('-1 -2 -2;  2  1  2; 2 2 3')
    uad = np.array([u, a, d])
    m = np.array([3, 4, 5])
    while m.size:
        m = m.reshape(-1, 3)
        if limit:
            m = m[m[:, 2] <= limit]
        yield from m
        m = np.dot(m, uad)

def find_target(values, target):

    dq = deque(sorted([(val, idx) for idx, val in enumerate(values)]))

    while True:
        if len(dq) < 2:
            return -1

        s =  dq[0][0] + dq[-1][0]

        if s > target:
            dq.pop()
        elif s < target:
            dq.popleft()
        else:
            break
    return dq[0], dq[-1]


ratioList = []

MAX_NUM = 500000

for i in list(gen_prim_pyth_trips(MAX_NUM)):
    ratioList.append((i[0]*i[1])/i[2]**2)
    if find_target(ratioList, (i[0]*i[1])/i[2]**2) != -1:
        print(find_target(ratioList, (i[0]*i[1])/i[2]**2))

The gen_prim_pyth_trips() function is from here. The "slow" part comes after the triples have been generated. find_target came from here.

It currently works fine but I am trying to find a way to make this faster or find a completely new way that is faster.

In the comments people have said that this is a variant of the 3SUM problem which according to the Wikipedia page can be done in O(n^2), where n is the number of numbers (i.e., my number of ratios). I have yet to find a way to implement this in general and in Python.

Any speedup at all would be helpful; it does not have to be just a better algorithm (libraries etc.). I believe this is currently slightly better than O(n^3) at the moment?

Additionally for MAX_NUM = 100,000, it is not too bad (about 4 minutes) but for 500,000 it is very bad (hasn't stopped running yet).

Ultimately I'd like to do MAX_NUM = 1,000,000 or possibly more.

Edit

I'd like to see a faster algorithm like O(n^2), or a major speed increase.

That's a variant of the [3SUM](https://en.wikipedia.org/wiki/3SUM) problem, and it's been extensively studied. Are you sure you want to do this with floats, though? Floating-point rounding makes naive equality comparisons problematic (and naive comparison-with-tolerance is just problematic in a different way). — user2357112, Dec 19 '19 at 03:25
This is a lot more difficult given that you are dealing with floating point numbers; for example, `0.1 + 0.7 == 0.8` is `False`. Are you able to use `Decimal` or just integers instead? — kaya3, Dec 19 '19 at 03:26
Forgive me if this isn't what your asking. This must be for decimal values so I presume they must be represented as a float. — Patrick Maynard, Dec 19 '19 at 03:28
@PatrickMaynard There is a `Decimal` data type in Python for accurate calculations. — Selcuk, Dec 19 '19 at 03:29
These numbers are irrational but it would probably suffice to check the n first digits. — Patrick Maynard, Dec 19 '19 at 03:29
I can post more background but I'm not sure how much it enhances this. — Patrick Maynard, Dec 19 '19 at 03:31
@Selcuk I did not know this. Are there advantages to this data type? — Patrick Maynard, Dec 19 '19 at 03:32
@user2357112supportsMonica thank you for pointing this out this is very insightful. — Patrick Maynard, Dec 19 '19 at 03:36
@kaya3 I did not know this. Why does that statement evaluate to false? Does the solution I proposed above even work? — Patrick Maynard, Dec 19 '19 at 03:38
*"The implementation for this is for a very long list of floats..."* What is a typical length of the list? Thousands? Millions? — Warren Weckesser, Dec 19 '19 at 03:43
@WarrenWeckesser thanks for pointing that out earlier. In the millions at least, but I'm kind of "searching" for something so arbitrarily long — Patrick Maynard, Dec 19 '19 at 03:48
@PatrickMaynard You’re _kind of “searching”_? What does that mean? If you’re concerned with efficiency, are you using NumPy? — AMC, Dec 19 '19 at 04:16
@PatrickMaynard Be careful, numpy discourages the use of `matrix` class in favour of `ndarray`s. The former is probably going to be removed entirely one of these days. — AMC, Dec 19 '19 at 05:07
Thanks, that is all I could find for generating primitive triples. This solution doesn't need to work long term. — Patrick Maynard, Dec 19 '19 at 05:18
there seems to be a nice looking impl of 3SUM in Python .. albeit vanilla not numpy, https://leetcode.com/problems/3sum/discuss/7498/Python-solution-with-detailed-explanation — antont, Dec 22 '19 at 01:46
You call both your limit and the number of resulting numbers "n". Please rename one of them (preferably the limit). — Stefan Pochmann, Dec 22 '19 at 02:27
Btw, what's the background/context of this? Is that some math conjecture you're checking or so? Might help to get insights for further improvement. And again: Please rename the limit. — Stefan Pochmann, Dec 23 '19 at 12:18
I'm not exactly sure what your referring to as far as renaming the limit. You might have to be a bit more specific but I'll change it. Also if you are interested in the math background of this I'm posting this conjecture I've formulated on math overflow soon and am trying to get some computational evidence behind it. I'll post a link — Patrick Maynard, Dec 24 '19 at 00:15
The thing that's partially called `limit` in your code. So better change `n = 500000` to `MAX_NUM = 500000` (as that's what @kaya3's answer already uses) and update the question text accordingly. — Stefan Pochmann, Dec 24 '19 at 01:40
Is this question related to this? https://math.stackexchange.com/questions/3418323/prove-frac-textarea-1c-12-frac-textarea-2c-22-neq-frac-texta — גלעד ברקן, Dec 24 '19 at 23:11

Stefan Pochmann · Accepted Answer · 2019-12-26T16:11:50.287

Hundreds of times faster than yours and without your floating point issues.
Thousands of times faster than kaya3's O(n²) solution.
I ran it until MAX_NUM = 4,000,000 and found no results. Took about 12 minutes.

Exploit the special numbers.

This is not just an ordinary 3SUM. The numbers are special and we can exploit it. They have the form ab/c², where (a,b,c) is a primitive Pythagorean triple.

So let's say we have a number x=ab/c² and we want to find two other such numbers that add up to x:

$x = \frac{ab}{c^2} =\frac{de}{f^2} + \frac{gh}{i^2} = \frac{dei^2+ghf^2}{(fi)^2}$

After canceling, the denominators c² and (fi)² become c²/k and (fi)²/m (for some integers k and m) and we have c²/k = (fi)²/m. Let p be the largest prime factor of c²/k. Then p also divides (fi)²/m and thus f or i. So at least one of the numbers de/f² and gh/i² has a denominator divisible by p. Let's call that one y, and the other one z.

So for a certain x, how do we find fitting y and z? We don't have to try all numbers for y and z. For y we only try those whose denominator is divisible by p. And for z? We compute it as x-y and check whether we have that number (in a hashset).

How much does it help? I had my solution count how many y-candidates there are if you naively try all (smaller than x) numbers and how many y-candidates there are with my way and how much less that is:

  MAX_NUM         naive           mine      % less
--------------------------------------------------
   10,000         1,268,028        17,686   98.61
  100,000       126,699,321       725,147   99.43
  500,000     3,166,607,571     9,926,863   99.69
1,000,000    12,662,531,091    30,842,188   99.76
2,000,000    50,663,652,040    96,536,552   99.81
4,000,000   202,640,284,036   303,159,038   99.85

Pseudocode

The above description in code form:

h = hashset(numbers)
for x in the numbers:
    p = the largest prime factor in the denominator of x
    for y in the numbers whose denominator is divisible by p:
      z = x - y
      if z is in h:
        output (x, y, z)

Benchmarks

Times in seconds for various MAX_NUM and their resulting n:

         MAX_NUM:    10,000   100,000   500,000  1,000,000  2,000,000  4,000,000
            => n:     1,593    15,919    79,582    159,139    318,320    636,617
--------------------------------------------------------------------------------
Original solution       1.6     222.3         -          -          -          -
My solution             0.05      1.6      22.1       71.0      228.0      735.5
kaya3's solution       29.1    2927.1         -          -          -          -

Complexity

This is O(n²), and maybe actually better. I don't understand the nature of the numbers well enough to reason about them, but the above benchmarks do make it look substantially better than O(n²). For quadratic runtime, going from n=318,320 to n=636,617 you'd expect a runtime increase of factor (636,617/318,320)² ≈ 4.00, but the actual increase is only 735.5/228.0 ≈ 3.23.

I didn't run yours for all sizes, but since you grow at least quadratically, at MAX_NUM=4,000,000 your solution would take at least 222.3 * (636,617/15,919)² = 355,520 seconds, which is 483 times slower than mine. Likewise, kaya3's would be about 6365 times slower than mine.

Lose time with this one weird trick

Python's Fraction class is neat, but it's also slow. Especially its hashing. Converting to tuple and hashing that tuple is about 34 times faster:

>set SETUP="import fractions; f = fractions.Fraction(31459, 271828)"

>python -m timeit -s %SETUP% -n 100000 "hash(f)"
100000 loops, best of 5: 19.8 usec per loop

>python -m timeit -s %SETUP% -n 100000 "hash((f.numerator, f.denominator))"
100000 loops, best of 5: 581 nsec per loop

Its code says:

[...] this method is expensive [...] In order to make sure that the hash of a Fraction agrees with the hash of a numerically equal integer, float or Decimal instance, we follow the rules for numeric hashes outlined in the documentation.

Other operations are also somewhat slow, so I don't use Fraction other than for output. I use (numerator, denominator) tuples instead.

The solution code

from math import gcd

def solve_stefan(triples):

    # Prime factorization stuff
    largest_prime_factor = [0] * (MAX_NUM + 1)
    for i in range(2, MAX_NUM+1):
        if not largest_prime_factor[i]:
            for m in range(i, MAX_NUM+1, i):
                largest_prime_factor[m] = i
    def prime_factors(k):
        while k > 1:
            p = largest_prime_factor[k]
            yield p
            while k % p == 0:
                k //= p

    # Lightweight fractions, represented as tuple (numerator, denominator)
    def frac(num, den):
        g = gcd(num, den)
        return num // g, den // g
    def sub(frac1, frac2):
        a, b = frac1
        c, d = frac2
        return frac(a*d - b*c, b*d)
    class Key:
        def __init__(self, triple):
            a, b, c = map(int, triple)
            self.frac = frac(a*b, c*c)
        def __lt__(self, other):
            a, b = self.frac
            c, d = other.frac
            return a*d < b*c

    # The search. See notes under the code.
    seen = set()
    supers = [[] for _ in range(MAX_NUM + 1)]
    for triple in sorted(triples, key=Key):
        a, b, c = map(int, triple)
        x = frac(a*b, c*c)
        denominator_primes = [p for p in prime_factors(c) if x[1] % p == 0]
        for y in supers[denominator_primes[0]]:
            z = sub(x, y)
            if z in seen:
                yield tuple(sorted(Fraction(*frac) for frac in (x, y, z)))
        seen.add(x)
        for p in denominator_primes:
            supers[p].append(x)

Notes:

I go through the triples in increasing fraction value, i.e., increasing x value.
My denominator_primes is the list of prime factors of x's denominator. Remember that's c²/k, so its prime factors must also be prime factors of c. But k might've cancelled some, so I go through the prime factors of c and check whether they divide the denominator. Why so "complicated" instead of just looking up prime factors of c²/k? Because that can be prohibitively large.
denominator_primes is descending, so that p is simply denominator_primes[0]. Btw, why use the largest? Because larger means rarer means fewer y-candidates means faster.
supers[p] lists the numbers whose denominator is divisible by p. It's used to get the y-candidates.
When I'm done with x, I use denominator_primes to put x into the supers lists, so it can then be the y for future x values.
I build the seen and supers during the loop (instead of before) to keep them small. After all, for x=y+z with positive numbers, y and z must be smaller than x, so looking for larger ones would be wasteful.

Verification

How do you verify your results if there aren't any? As far as I know, none of our solutions have found any. So there's nothing to compare, other than the nothingness, which is not exactly convincing. Well, my solution doesn't depend on the Pythagoreanness, so I created a set of just primitive triples and checked my solution's results for that. It computed the same 25,336 results as a reference implementation:

def solve_reference(triples):
    fractions = {Fraction(int(a) * int(b), int(c)**2)
                 for a, b, c in triples}
    for x, y in combinations_with_replacement(sorted(fractions), 2):
        z = x + y
        if z in fractions:
            yield x, y, z

MIN_NUM = 2
MAX_NUM = 25
def triples():
    return list((a, b, c)
                for a, b, c in combinations(range(MIN_NUM, MAX_NUM+1), 3)
                if gcd(a, gcd(b, c)) == 1)
print(len(triples()), 'input triples')
expect = set(solve_reference(triples()))
print(len(expect), 'results')
output = set(solve_stefan(triples()))
print('output is', ('wrong', 'correct')[output == expect])

Output:

1741 input triples
25336 results
output is correct

Did you read this? [https://math.stackexchange.com/questions/3418323/prove-frac-textarea-1c-12-frac-textarea-2c-22-neq-frac-texta](https://math.stackexchange.com/questions/3418323/prove-frac-textarea-1c-12-frac-textarea-2c-22-neq-frac-texta) — גלעד ברקן, Dec 26 '19 at 15:04
@גלעדברקן Had a look, thanks. Don't understand it all, is there something particular I should look at that would help me or so? — Stefan Pochmann, Dec 26 '19 at 16:25
I don't know if there's anything there that's helpful in particular. It seems like the exercise here is meant to bolster or investigate the conjecture there. There seems to be at least one necessary condition expressed for the denominator that I think you may have utilized, and also an interesting general formulation as a rational, non-linear diophantine equation. — גלעד ברקן, Dec 26 '19 at 16:49
@גלעדברקן In the comments under the question here, Patrick said he'll post a conjecture soon and is looking for evidence. So I guess it'll be an update to his original. Ok, I see those conditions now and they're indeed similar to what I did, but more general. I had also looked for something more general and I have a rough idea how to implement it. I'm not sure mathlove's conditions apply to triples as well, for example the triple (4,3,2) has gcd(4,3,2)=1 but (4⋅3)/2²=3 so the denominator lost its prime factors completely. — Stefan Pochmann, Dec 26 '19 at 17:08
Oops I implemented it wrong never mind this is tremendous! Thanks for all the effort. This is a bit out of my scope so may take a bit to digest. Will the same algorithm work if we don't square the denominator? If so I'd really like to try that out because I know of one matching that condition and it would be a great way to test this further and really "double check" that this works. Also (4,3,2) is not a triple that is why that doesn't work. Remember a triple is where a^2 + b^2 = c^2 and a/b/c are all integers. — Patrick Maynard, Dec 26 '19 at 18:46
@PatrickMaynard I think it should work for when you don't square, yes. I just tried and it found 3*4/5 + 20*21/29 = 17*144/145, is that the one you meant? For (4,3,2) I didn't mean Pythagorean. I thought I had seen a Pythagorean one but I confused it with one from my verification test. Easy to prove it can't happen for Pythagoreans, so my k will always be 1. Will simplify my code/text. — Stefan Pochmann, Dec 26 '19 at 19:27
That is actually even better because my original implementation missed that one (presumably do to float rounding error)! Is there any way you could check really quick that it finds this one, 21*20/29 + 119*120/169 = 119*120/169? Sorry if I'm asking a lot I'm using this in my post later so I just want to be absolutely sure. Also if it is not to much trouble how did you modify it to do that search (did you just change: self.frac = frac(a*b, c*c) to be self.frac = frac(a*b, c))? — Patrick Maynard, Dec 26 '19 at 20:36
@StefanPochmann Never mind I just figured it out and it does find it. Thank you for all your effort and ingenuity on this one! You made it way faster and made it work way better I will be sure to give you props when I post this later on MO. — Patrick Maynard, Dec 26 '19 at 20:43
@StefanPochmann One last question is there a way to modify this search to check purely for 1/c^2 +1/c^2 = 1/c^2 ? Or does this prevent us from exploiting the prime factorization used in this method? — Patrick Maynard, Dec 26 '19 at 21:04
@PatrickMaynard Works for numerator 1, too, though I suspect that this even more special case allows an even more efficient method. — Stefan Pochmann, Dec 27 '19 at 12:00

kaya3 · Answer 2 · 2019-12-22T03:52:32.653

4

You mention the naive algorithm being O(n³), but the O(n²) algorithm is also very simple if you can use a hashtable, such as a Python set:

MAX_NUM = 500000

from fractions import Fraction
from itertools import combinations_with_replacement

def solve(numbers):
    for a, b in combinations_with_replacement(numbers, 2):
        c = a + b
        if c in numbers:
            yield (a, b, c)

ratio_set = {
    Fraction(int(p) * int(q), int(r) ** 2)
    for p, q, r in gen_prim_pyth_trips(MAX_NUM)
}

for a, b, c in solve(ratio_set):
    print(a, '+', b, '=', c)

This uses the Fraction class, so that there is no funny business about floating point arithmetic being inexact, and so that + and == are done in constant time assuming your numbers are bounded. In that case, the the running time is O(n²) because:

Inserting into a hashtable takes O(1) time, so building the set is O(n) time.
The for a, b in ... loop iterates over O(n²) pairs, and each set membership test is O(1).

The space complexity is O(n) for the set.

If we account for the cost of arithmetic and comparisons, the running time is O(n² log MAX_NUM) where MAX_NUM is the maximum absolute value of the integers, since + and == on Python's arbitrarily-large integers takes logarithmic time.

Can we do better than this? As you identified in the question, this problem is a variant of the well-studied 3SUM problem, sometimes referred to as 3SUM' (three-sum prime). The standard 3SUM problem asks for a + b + c = 0. The 3SUM' problem asks for a + b = c.

It is known to have the same difficulty, i.e. if there is an algorithm which solves 3SUM in a certain asymptotic time then there is an algorithm which solves 3SUM' in the same asymptotic time, and vice versa. (See these lecture notes by Adler, Gurram & Lincoln for a reference.)

According to Wikipedia, the best known algorithm for 3SUM is due to Timothy M. Chan (2018):

We present an algorithm that solves the 3SUM problem for n real numbers in O((n² / log² n)(log log n)^O(1)) time, improving previous solutions by about a logarithmic factor.

The complexity O((n² / log² n)(log log n)^O(1)) is less than O(n²), but not by much, and the gain might be nullified by the constant factor for inputs of any practical size. It's an open problem whether there is any algorithm solving 3SUM in O(nᶜ) time for c < 2. I think these complexities are derived assuming constant-time arithmetic and comparisons on numbers.

edited Dec 22 '19 at 03:52

answered Dec 22 '19 at 01:46

kaya3

47,440
4
68
97

Isn't Patrick's own algorithm already quadratic except for the silly resorting every time? I think if he simply calls `ratioList.sort()` after every `append` it'll be quadratic, no? – Stefan Pochmann Dec 22 '19 at 02:11
1

To be honest, I'm not sure how the algorithm in the question works, and I didn't take the time to analyse it. I'd be interested to see an explanation. – kaya3 Dec 22 '19 at 02:16
1

It's really easy and you should know it, so I recommend you do take the time :-P – Stefan Pochmann Dec 22 '19 at 02:17
Ah, I followed the link to the SO post and it's explained there. Yes, if you insert each new element into the sorted list in O(n) time instead of repeatedly sorting in O(n log n) time, then it should be O(n^2). – kaya3 Dec 22 '19 at 02:18
That said, I'm not convinced that the in-place `.sort()` will necessarily take only O(n) time when only the last element is out of order. Doing a linear search for the right index and then `.insert` definitely works, though. – kaya3 Dec 22 '19 at 02:21
1

If he kept it sorted then the each sorting would be O(n). And it has the advantage of using only O(n) space. Good luck trying to build a Θ(n²) size hashtable when n is a million :-). Btw I'd much prefer if he didn't use "n" to refer both to his limit and the number of resulting numbers :-( – Stefan Pochmann Dec 22 '19 at 02:23
Good point, but you can just build a set of size O(n) instead. I've edited. – kaya3 Dec 22 '19 at 02:26
1

Python's sort is indeed linear in such cases, as it identifies already sorted runs (at most two in such cases) and merges them. See https://en.wikipedia.org/wiki/Timsort. Although `bisect.insort` would be better. I just went with the smallest change. – Stefan Pochmann Dec 22 '19 at 02:30
1

Since you're trying to get it correct now with `Fraction`, you might also want to cast the triple elements from `numpy.int64` to `int` so you don't get overflows. – Stefan Pochmann Dec 22 '19 at 03:47
That's funny, I didn't know that `Fraction` worked with non-native `int` types - but you're right, it keeps the internal representation as `numpy.int64` if you don't explicitly convert to `int`. – kaya3 Dec 22 '19 at 03:51
1

I didn't know it, either. I only realized it when, as an alternative to searching exact `a+b == c` matches, I tried updating a global minimum of `abs(a+b - c)` to see how close to a match I've got so far. And then that occasionally went *up*. I was like ... wtf how can "if now < min: min = now" fail? And after some digging I saw the bad type which broke the `<`. – Stefan Pochmann Dec 22 '19 at 04:01
Thank you for this great input! Unfortunately though the code provided runs slower than the original. I'm not exactly certain as to why but I believe it might be the fraction comparison but I'll look into it further. – Patrick Maynard Dec 22 '19 at 20:22
The fraction comparison is likely a slowdown, yes. But without it you are comparing floating-point numbers for exact equality, which means the algorithm will miss a lot of solutions due to floating-point rounding errors. – kaya3 Dec 22 '19 at 20:25
1

@PatrickMaynard Hmm... I think both of your solutions could be made to work correctly with floats. In yours, you could primarily work with floats and whenever you find an "almost-match", like `abs(s - target) < 1e-9`, you could use their fractions to check whether it's exact. And kaya3 could replace the set of fractions with a dict mapping floats rounded to 9 places to a set of its fractions (and then round `c`, and if `c in numbers`, then check whether c's fraction is in the set). Not sure this would help much, depends on the number of rounding collisions and thus your numbers. – Stefan Pochmann Dec 22 '19 at 21:36
Depends also on the cost of doing two dict lookups on rounded floats (one rounded up, one rounded down to make sure) vs. one dict lookup on a fraction (essentially a tuple of two integers). I'm happy to leave my solution as-is, though, so if anyone wants to optimize it, feel free to use it in your own answer. – kaya3 Dec 22 '19 at 21:46
Thanks I ran it for 10,000,000 and have final posted the conjecture here https://mathoverflow.net/questions/349189/prove-frac-textarea-1c-12-frac-textarea-2c-22-neq-frac-texta . Thanks for your help couldn't have done it myself – Patrick Maynard Dec 27 '19 at 19:14

Stefan Pochmann · Answer 3 · 2019-12-24T16:29:37.837

3

I'd like to see a faster algorithm like O(n^2)

Do ratioList.sort() after your ratioList.append(...) and tadaa... you have O(n^2).

You're already O(n^2 log n) and the log just comes from resorting from scratch all the time.

With this, your runtime for MAX_NUM = 100,000 shrinks from 222 seconds to 116 seconds on my PC.

edited Dec 24 '19 at 16:29

answered Dec 22 '19 at 03:23

Stefan Pochmann

27,593
8
44
107

Thank you for this! I appreciate it a lot I have tested this and it runs faster I am currently working on optimizing my python and exploring faster libraries. – Patrick Maynard Dec 22 '19 at 20:20