3

Is there a way to improve my algorithm to get the optimal one in terms of computing time?

Given a range of non-zero dividend numbers [A,B] (1<=A,B<=10^18), and a set of divisors D = {x: 1 <= x <= 50}, I want to find the total number of non-repeating dividends in [A,B] that can be divided by any number in set D.

Example 1 (doesn't take too much time)

The range of dividend numbers [1,10] and the divisors are {2,3,4}

  • list of numbers divisible for divisor 2: 2,4,6,8,10
  • list of numbers divisible for divisor 3: 3,6,9
  • list of numbers divisible for divisor 4: 4,8

So the total number of non-repeat divisible dividends in [1,10] = 7

Example 2 (takes a lot of time)

[A,B] = [79031836253224256, 403892407826392155]
D = {44, 46, 50, 8, 35,41,6,18, 24, 23, 7, 45, 39, 46, 4, 35, 34}
Output: 161802330522861274

1st version of the algorithm in Python

def solution(A,B,D):                                                                                                                                                                         
    a = set()                                                                                                      
    for i in range(A,B+1):                                                                                         
            for j in D:                                                                                            
                    if i%j == 0:                                                                                   
                            a.add(i)                                                                               
    # return the count of a                                                                                        
    return len(a)   

if __name__ == "__main__":                                                                                             
        print(solution(1,10,[2,3,4]))                                                                                  
baduker
  • 19,152
  • 9
  • 33
  • 56
Ster-BigData
  • 89
  • 2
  • 11
  • 1
    you could preprocess the data in the divisor list. for example: `if x % 4 == 0` -> `x % 2 == 0` – A.Bau Mar 24 '18 at 07:56
  • 1
    Try math.stackexchange.com for hints on how to find the answer quickly. (Pretend you were expected to compute the result for your "hard" case yourself, rather than getting the computer to do it.) – Karl Knechtel Mar 24 '18 at 08:04

1 Answers1

5

One key observation is that the range [A, B] is very large, making it prohibitive to iterate from A to B. However, the size of the divisor lists is relatively small.

We can count the required number without explicitly constructing the respective set.

First, notice that if D would consist of only a single number (let's denote that number as d), then we could easily find the number of multiples in that range: it would be floor(B / d) - floor((A - 1) / d). The problem is that when D has more than one elements, simply summing the above formulas for each element would lead to counting some numbers multiple times.

We can tackle the aforementioned issue by using the inclusion-exclusion principle.


I'll illustrate this technique using a small example: A = 2, B = 8, D = {2, 3}.

  • Count how many numbers have 2 as a divisor. There are 4 such numbers: {2, 4, 6, 8}.

  • Count how many numbers have 3 as a divisor. There are 2 such numbers: {3, 6}.

  • Count how many numbers have both 2 and 3 as a divisor. There is just 1 such number: {6}.

According to the inclusion-exclusion principle, to count how many numbers have at least one divisor in the given set we need to add the results from steps 1 and 2, and subtract the result from step 3 (this is because during steps 1 and 2 we have counted the number 6 twice). Thus, the result is 4 + 2 - 1 = 5.

In general, we need to consider all subsets of the set D. For each subset, we need to count how many numbers in the range [A, B] are divisible by all the numbers in the current subset. This number is equal to floor(B / LCM) - floor((A - 1) / LCM), where LCM is the least common multiple of all the numbers in the current subset.


Pre-processing the list of divisors.

The direct implementation of the above ideas leads to a code that investigates O(2M) subsets, where M is the maximum number of divisors in the set D (in the current problem settings, M = 50). However, we can use the observation mentioned by A. Bau in the comments to reduce D and reduce the number of subsets to O(2M/2).

If a number d1 in D is a multiple of another number d2 from D, then d1 has no influence in the final result, and we can eliminate it.

In the current problem settings, where all numbers in D are in {1, 2, .. 50}, this pre-processing guarantees that we are left with at most 25 numbers.

The demonstration for the fact that we'll have at most 25 numbers is interesting, but I won't go into the details (unless someone is interested) -- in brief, we can partition the set {1, 2, ... 50} into 25 disjoint subsets, where ith subset contains numbers of the form Si = {(2 ∙ i - 1) ∙ 2k}. Choosing more than 25 numbers guarantees that we'll have 2 numbers in the same set, and the larger of them will divide the smaller one. Furthermore, this bound is tight, because we can find sets of 25 numbers that can't be reduced any further, e.g. {26, 27, .., 50}.

For the worst case of 25 numbers, there are 225 = 33,554,432 subsets that we need to look into, which is feasible, since each subset can be rapidly investigated.


The following code implements the above ideas and runs in 0.002 seconds for the large example:

import itertools
import time


def gcd2(a, b):
    while b > 0:
        a, b = b, a % b

    return a


def lcm2(a, b):
    return int(a * b / gcd2(a, b))


def lcm(list):
    ans = list[0]

    for i in range(1, len(list)):
        ans = lcm2(ans, list[i])

    return ans


def preprocess(D):
    D2 = []
    D = sorted(D)

    for i in range(len(D)):
        include = True
        for j in range(i):
            if D[i] % D[j] == 0:
                include = False
                break

        if include:
            D2.append(D[i])

    return D2


def solution2(A, B, D):
    D = preprocess(D)

    ans = 0
    for num in range(1, len(D) + 1):
        for list in itertools.combinations(D, num):
            v = lcm(list)

            sign = 1 if len(list) % 2 == 1 else -1
            val = sign * ((B // v) - (A - 1) // v)
            ans += val

    return ans


if __name__ == '__main__':
    [A, B] = [79031836253224256, 403892407826392155]
    D = [44, 46, 50, 8, 35, 41, 6, 18, 24, 23, 7, 45, 39, 46, 4, 35, 34]

    t = time.time()
    print(solution2(A, B, D))
    print('processing took {} s. '.format(time.time() - t))

The result is:

161802330522861274
processing took 0.00200653076171875 s. 

Discussion.

As גלעד ברקן pointed out in the comments, this still takes a long time in the worst situation, consisting of the divisor set being {26, 27, ..., 50}. On my computer it takes around 6 minutes.

In that case there are about 33 millions subset that need to be considered, and the least common multiple (LCM) needs to be calculated for each such subset.

Right now, the LCM computation is made independently for each subset, but it is possible to share some computations (because LCM(union(S1, S2)) = LCM(LCM(S1), LCM(S2))).

Nevertheless, that might still not be enough, because even if the LCM was computed instantaneously, the current approach would still take around 18 seconds on my computer.

Finally, I made another experiment, and I measured the time of iterating through 33 million entries, and performing a single integer division and an addition during each iteration. This takes around 8.4 seconds in python, and about 0.12 seconds in C++.

In conclusion, it might be the case that the techniques described in this answer are not the best for the given problem settings, but it's also possible that simply implementing these ideas in a faster programming language would fit the requirements.

danbanica
  • 3,018
  • 9
  • 22