5

I would like to find the highest value m = a*b that satisfies some condition C(m), where

1 <= a <= b <= 1,000,000.

In order to do that, I'd like to iterate all pairs of a,b in decreasing order of a*b.

For example, for values up to 5, the order would be:

5 x 5 = 25
4 x 5 = 20
4 x 4 = 16
3 x 5 = 15
3 x 4 = 12
2 x 5 = 10
3 x 3 = 9
2 x 4 = 8
2 x 3 = 6
1 x 5 = 5
1 x 4 = 4
2 x 2 = 4
1 x 3 = 3
1 x 2 = 2
1 x 1 = 1

So far I've come up with a BFS-like tree search, where I generate candidates from the current "visited" set and pick the highest value candidate, but it's a tangled mess, and I'm not sure about correctness. I wonder if there's some sort of trick I'm missing.

I'm also interested in the more general case of ordering by any monotonic function f(a,b), if such a thing exists.

For illustration, C(m) could be "return true if m2+m+41 is prime, otherwise return false", but I'm really looking for a general approach.

itsadok
  • 28,822
  • 30
  • 126
  • 171
  • 2
    What is the condition C(m)? (and why have you left out multiples of 7 in your list)? – Abhishek Bansal Aug 07 '14 at 05:00
  • 2
    @user1990169 because 7 is larger than 5 and is this not "up to 5"? Just a guess. – n. m. could be an AI Aug 07 '14 at 05:11
  • I don't understand. You say the property C depends only on the number m (and not how it is factored). Then you should search over possible m testing C. Searching over factorisations is wasteful, not to say complicated. Your assertion that m can be written a product of small numbers is simply another condition D to test. – Colonel Panic Aug 12 '14 at 13:21
  • @ColonelPanic I pretty much agree. That's the idea behind nmore's answer. I was hoping there was a math trick to solve the specific case of a*b without using a heap. Anyway, if condition D is sparse enough it might still be faster to use a heap, e.g. if I had used a^3*b^3 instead of a*b. – itsadok Aug 12 '14 at 13:50

4 Answers4

3

Provided that C(m) is so magical that you cannot use any better technique to find your solution directly and thus you really need to traverse all a*b in decreasing order, this is what I would do:

Initialize a max-heap with all pairs (a, b) such that a = b. This means that the heap contains (0, 0), (1, 1), ... , (1.000.000, 1.000.000). The heap should be based on the a * b value.

Now continuously:

  1. Get the max pair (a, b) from the heap.
  2. Verify if (a, b) satisfies C(a * b). If so, you are done.
  3. Otherwise, add (a, b-1) to the heap (provided b > 0, otherwise do nothing).

This is a very simple O(n log n) time and O(n) space algorithm, provided that you find the answer quickly (in a few iterations). This of course depends on C.


If you run into space problems you can of course easily decrease the space complexity by splitting up the problem in a number of subproblems, for instance 2:

  1. Add only (500.000, 500.000), (500.001, 500.001), ... , (1.000.000, 1.000.000) to the heap and find your best pair (a, b).
  2. Do the same for (0, 0), (1, 1), ... (499.999, 499.999).
  3. Take the best of the two solutions.
Vincent van der Weele
  • 12,927
  • 1
  • 33
  • 61
  • How the time complexity of the first part is O(n logn)? there is n pair of them, and each of them decrease from 'x' to 0, so this will be 'n' + 'n - 1' + ... + '0' which is O(n^2) times the heap operation O(log(n^2)), so I think it should be O(n^2 log(n^2)). – Pham Trung Aug 08 '14 at 08:01
  • @PhamTrung I meant that the overhead of the datastructure is `O(n log n)`. Of course, worst case you need to loop over all `O(n^2)` pairs before you find one for which `C` holds. The overhead of the datastructure is `O(log n)` per pair, because the heap has size `O(n)` only. However, `O(log n^2) = O(2 log n) = O(log n)`, so your analysis is correct too. – Vincent van der Weele Aug 08 '14 at 09:01
  • By the way, it looks like this solution does indeed work not just for a*b but for any function f(a,b) where f(a-ε,b)≤f(a,b) and f(a,b-ε)≤f(a,b). – itsadok Aug 10 '14 at 05:01
  • @itsadok Or if you initialize on `(1.000.000, b)` for all `b`, you could even have any `f` such that `f(a-1, b)≤f(a, b)`. Also, no need for `f(a, b) = f(b, a)`. – Vincent van der Weele Aug 10 '14 at 13:04
2

Here's a not particularly efficient way to do this with a heap in Python. This is probably the same thing as the BFS you mentioned, but it's fairly clean. (If someone comes up with a direct algorithm, that would of course be better.)

import heapq  # <-this module's API is gross. why no PriorityQueue class?

def pairs_by_reverse_prod(n):
    # put n things in heap, since of course i*j > i*(j-1); only do i <= j
    # first entry is negative of product, since this is a min heap
    to_do = [(-i * n, i, n) for i in xrange(1, n+1)]
    heapq.heapify(to_do)

    while to_do:
        # first elt of heap has the highest product
        _, i, j = to_do[0]
        yield i, j

        # remove it from the heap, replacing if we want to replace
        if j > i:
            heapq.heapreplace(to_do, (-i * (j-1), i, j-1))
        else:
            heapq.heappop(to_do)
Danica
  • 28,423
  • 6
  • 90
  • 122
  • @Heuster beat me to basically the same solution (by a while, since I got distracted while writing this), but since there's code here I figured I'd leave it up. – Danica Aug 07 '14 at 05:49
1

Below code will generate (and print):

[(5, 5), (4, 5), (4, 4), (3, 5), (3, 4), (2, 5), (3, 3), (2, 4), (2, 3), (1, 5), (1, 4), (2, 2), (1, 3), (1, 2), (1, 1)]

which is basically what you want, since the code can break early if your condition is satisfied. I think the whole point of this question is NOT to generate all possible combinations of (a, b).

The key point of the algorithm is that in each iteration, we need to consider (a - 1, b) and (a, b - 1). If a == b, however, since a <= b, we only need to consider (a - 1, b). The rest is about maintaining order in the queue of tuples, Q, based on their product, m.

In terms of efficiency, when inserting into Q, the code performs linear search from index 0. Performing binary search instead of this linear search may or may not make things faster for larger values of a and b.

Also to further optimize the code, we can store m alongside (a, b) in Q so that we do not have to calculate a * b many times. Also using the 1D bucket structure with m as the key to implement Q would be interesting.

#!/usr/bin/python

def insert_into_Q((a, b), Q):

    if (a == 0) or (b == 0):
        return

    pos = 0
    for (x, y) in Q:
        if (x == a) and (y == b):
            return
        if x * y < a * b:
            break
        pos = pos + 1
    Q.insert(pos, (a, b))


def main(a, b):

    Q = [(a, b)]
    L = []

    while True:

        if len(Q) == 0:
            break

        (a, b) = Q.pop(0)
        L.append((a, b)) # Replace this with C(a * b) and break if satisfied.

        a1 = a - 1
        b1 = b - 1

        if (a == b):
            insert_into_Q((a1, b), Q)
        else:
            insert_into_Q((a1, b), Q)
            insert_into_Q((a, b1), Q)

    print(L)


if __name__ == "__main__":
    main(5, 5)
wookie919
  • 3,054
  • 24
  • 32
1

Note: this is a test of the function C(m) where m <= some target. It will not work for OP's general situation, but is a side case.

First find the highest number that satisfies C, and then find the pair that matches that high number. Finding the initial target number takes almost no time since its a binary search from 1 to 1E12. Finding the pair that matches is a bit harder, but is still not as bad as factoring.

Code:

public class TargetPractice {

    private static final long MAX = 1000000L;

    private long target;

    public static void main(String[] args) {
        Random r = new Random();
        for (int i = 0; i < 5; i++) {
            TargetPractice tp = new TargetPractice(r.nextInt((int) MAX), r.nextInt((int) MAX));
            System.out.println("Trying to find " + tp.target);
            System.gc();
            long start = System.currentTimeMillis();
            long foundTarget = tp.findTarget();
            long end = System.currentTimeMillis();
            System.out.println("Found " + foundTarget);
            System.out.println("Elapsed time " + (end - start) + "\n");
        }
    }

    public TargetPractice(long a, long b) {
        target = a * b + 1;
    }

    private long binSearch() {
        double delta = MAX * MAX / 2;
        double target = delta;

        while (delta != 0) {
            if (hit((long) target)) {
                target = target + delta / 2;
            } else {
                target = target - delta / 2;
            }
            delta = delta / 2;
        }

        long longTarget = (long) target;
        for (int i = 10; i >= -10; i--) {
            if (hit(longTarget + i)) {
                return longTarget + i;
            }
        }
        return -1;
    }

    private long findTarget() {
        long target = binSearch();
        long b = MAX;
        while (target / b * b != target || target / b > MAX) {
            b--;
            if (b == 0 || target / b > MAX) {
                b = MAX;
                target--;
            }
        }
        System.out.println("Found the pair " + (target/b) + ", " + b);
        return target;
    }

    public boolean hit(long n) { 
        return n <= target;
    }
}

It prints:

Trying to find 210990777760
Found the pair 255976, 824260
Found 210990777760
Elapsed time 5

Trying to find 414698196925
Found the pair 428076, 968749
Found 414698196924
Elapsed time 27

Trying to find 75280777586
Found the pair 78673, 956882
Found 75280777586
Elapsed time 1

Trying to find 75327435877
Found the pair 82236, 915991
Found 75327435876
Elapsed time 19

Trying to find 187413015763
Found the pair 243306, 770277
Found 187413015762
Elapsed time 23

nmore
  • 2,474
  • 1
  • 14
  • 21
  • That's pretty neat, but note that finding the highest number that satisfies C can still take a trillion tries. And there's no guarantee that the number we find will be legal - it might be prime or have a factor that is higher than 1M, so we'd have to continue searching. I agree that in practice for f(a,b)=a*b this is probably faster. – itsadok Aug 07 '14 at 07:34
  • Ok updated the code with a minor check. Now this solution will never take a trillion tries. In addition, it will find a legal number if there is one. – nmore Aug 07 '14 at 08:31
  • 1
    Sorry, I'm not really getting what you did there. My point was that C(m) could be anything, so it might return "false" for any number over 2, and therefore you would have to check all the numbers from 1 trillion down to 2 before you even find the target. You can't use a binary search for that. – itsadok Aug 07 '14 at 08:45
  • Ah, I see what you are saying, I assumed C was continuous.... Will edit the post. – nmore Aug 07 '14 at 09:08