2

I was writing a very simple program to examine if a number could divide another number evenly:

// use the divider squared to reduce iterations
for(divider = 2; (divider * divider) <= number; divider++)
    if(number % divider == 0)
        print("%d can divided by %d\n", number, divider);

Now I was curious if the task could be done by finding the square root of number and compare it to divider. However, it seems that sqrt() isn't really able to boost the efficiency. How was sqrt() handled in C and how can I boost the efficiency of sqrt()? Also, is there any other way to approach the answer with even greater efficiency?

Also, the

number % divider == 0

is used to test if divider could evenly divide number, is there also a more efficient way to do the test besides using %?

Will Ness
  • 70,110
  • 9
  • 98
  • 181
edhu
  • 449
  • 6
  • 23
  • @RSahu I don't believe this is a proper duplicate. Please reopen. – fuz Feb 17 '16 at 23:06
  • @FUZxxl, I was biased by the title. – R Sahu Feb 17 '16 at 23:07
  • With a little more research, you might rephrase your question as: "What is an efficient algorithm for finding all factors of an integer?" I think you will find that this is a rather deep question that has been tackled many times on this forum... – Steger Feb 17 '16 at 23:08
  • Searching for "why are floating point operations slower than integer operations" in google produces a lot of interesting links. – R Sahu Feb 17 '16 at 23:09
  • @Steger I'll take tour suggestion as I was indeed trying to find different ways to tackle this problem:) – edhu Feb 17 '16 at 23:12
  • It depends on problem size. sqrt takes 100 cycles most, multiplication takes at leas half a cycle. So if your integer is greater than 400 then sqrt definitely has the advantage. – user3528438 Feb 17 '16 at 23:14
  • [An interesting approach to compute square root of integers](http://stackoverflow.com/a/5296669/434551). – R Sahu Feb 17 '16 at 23:14
  • 1
    On the other hand, the remainder takes at least 10x more time than multiplication anyway, so event the ideal case you are not saving a lot of time. – user3528438 Feb 17 '16 at 23:24
  • @user3528438 On Haswell, an `fsqrt` takes as long as an `fdiv` according to Agner Fog. – fuz Feb 17 '16 at 23:30
  • `if(number % divider = 0)`will not compile: should be `if(number % divider == 0)` for a comparison. – Weather Vane Feb 18 '16 at 00:23
  • @WeatherVane modified – edhu Feb 18 '16 at 00:25

4 Answers4

3

However, it seems that sqrt() isn't really able to boost the efficiency

That is to be expected, as the saved multiplication per iteration is largely dominated by the much slower division operation inside the loop.

Also, the number % divider = 0 is used to test if divider could evenly divide number, is there also a more efficient way to do the test besides using %?

Not that I know of. Checking whether a % b == 0 is at least as hard as checking a % b = c for some c, because we can use the former to compute the latter (with one extra addition). And at least on Intel architectures, computing the latter is just as computationally expensive as a division, which is amongst the slowest operations in typical, modern processors.

If you want significantly better performance, you need a better factorization algorithm, of which there are plenty. One particular simple one with runtime O(n1/4) is Pollard's ρ algorithm. You can find a straightforward C++ implementation in my algorithms library. Adaption to C is left as an exercise to the reader:

int rho(int n) { // will find a factor < n, but not necessarily prime
  if (~n & 1) return 2;
  int c = rand() % n, x = rand() % n, y = x, d = 1;
  while (d == 1) {
    x = (1ll*x*x % n + c) % n;
    y = (1ll*y*y % n + c) % n;
    y = (1ll*y*y % n + c) % n;
    d = __gcd(abs(x - y), n);
  }
  return d == n ? rho(n) : d;
}

void factor(int n, map<int, int>& facts) {
  if (n == 1) return;
  if (rabin(n)) { // simple randomized prime test (e.g. Miller–Rabin)
    // we found a prime factor
    facts[n]++;
    return;
  }
  int f = rho(n);
  factor(n/f, facts);
  factor(f, facts);
}

Constructing the factors of n from its prime factors is then an easy task. Just use all possible exponents for the found prime factors and combine them in each possible way.

Niklas B.
  • 92,950
  • 18
  • 194
  • 224
  • If the input range is limited and I pre-compute all prime numbers in that range as a table, what would the complicity be? (or is such a table feasible for say int32?) – user3528438 Feb 17 '16 at 23:47
  • @user3528438 You can compute the primes in the range 1 to M in time O(M * log log M) using the [Sieve of Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes). There are better algorithms for the job, but this is by far the simplest, and for M in the order of hundreds of millions to even billions it should be the fastest one if implemented efficiently. – Niklas B. Feb 17 '16 at 23:49
  • @user3528438 And to answer your second question, there are about 200 million primes in that range (pi(n) ~= n / ln n), so the table would occupy somewhat around 800 megabytes. You can probably compute it within a couple of seconds on a single CPU, so yes, that might well be feasible depending on your constraints – Niklas B. Feb 17 '16 at 23:54
  • Well, in the uint32 range, the largest factor must be below 65536, and below 65536 there are only 6542 prime numbers. So at least in the uint32 range, it would be quite efficient and feasible isn't it? – user3528438 Feb 18 '16 at 00:06
  • @user3528438 Oh you mean precomputing the primes up to sqrt(n) and using that table? Yes that's very feasible for your given range. However it's asymptotically "only" a logarithmic factor better than naive trivial division because of how dense primes are within the set of natural numbers. For 64 bits this is no longer feasible, but Pollard-Rho still "kinda" is (depending on your time constraints of course). – Niklas B. Feb 18 '16 at 00:07
  • @user3528438 Not sure what algorithm you have in mind. Say `n <= 4294967296 = 2^32` and you have precomputed all primes up to `sqrt(2^32) = 2^16 = 65536`. Then you will still need to check every one of the ~6000 primes in that range for whether they are a prime factor of `n` – Niklas B. Feb 18 '16 at 00:15
  • Not really. If you divide the input by the prime factor each time you find one, and terminate when the input is smaller than the prime number to be checked. 4294967296 is a good case because it "hits" the table very early(at 2s). Worst case is some 4293001441 which doesn't hit until the very end (needs to check all 6542 primes bellow 65536). – user3528438 Feb 18 '16 at 00:20
  • Skipping multiples of two is doing something else than the OP asks. Your method produces an asymmetry. It excludes multiples of two but includes e.g. multiples of three. The OP is not finding only prime divisors but all divisors (excluding 1 and the number itself). – Z boson Feb 18 '16 at 08:55
  • @Zboson OPs method is already borked if he wants to find all the divisors, because their method cannot find factors larger than sqrt(n). I'm providing a means to compute prime factors, which can be used to compute the divisors afterwards. If you so will, this is an output-sensitive algorithm, because most numbers do not have a lot of divisors (but some do) – Niklas B. Feb 18 '16 at 10:34
  • `printf("%d can divided by %d and %d\n", number, divider, number/divider);` gives the other factor. – Z boson Feb 18 '16 at 10:40
  • @Zboson I know that, thanks. Still it's not clear what OPs intention is, due to this – Niklas B. Feb 18 '16 at 10:45
3

I'm not going to address what the best algorithm to find all factors of an integer is. Instead I would like to comment on your current method.

There are thee conditional tests cases to consider

  1. (divider * divider) <= number
  2. divider <= number/divider
  3. divider <= sqrt(number)

See Conditional tests in primality by trial division for more detials.

The case to use depends on your goals and hardware.

The advantage of case 1 is that it does not require a division. However, it can overflow when divider*divider is larger than the largest integer. Case two does not have the overflow problem but it requires a division. For case3 the sqrt only needs to be calculated once but it requires that the sqrt function get perfect squares correct.

But there is something else to consider many instruction sets, including the x86 instruction set, return the remainder as well when doing a division. Since you're already doing number % divider this means that you get it for free when doing number / divider.

Therefore, case 1 is only useful on system where the division and remainder are not calculated in one instruction and you're not worried about overflow.

Between case 2 and case3 I think the main issue is again the instruction set. Choose case 2 if the sqrt is too slow compared to case2 or if your sqrt function does not calculate perfect squares correctly. Choose case 3 if the instruction set does not calculate the divisor and remainder in one instruction.

For the x86 instruction set case 1, case 2 and case 3 should give essentially equal performance. So there should be no reason to use case 1 (however see a subtle point below) . The C standard library guarantees that the sqrt of perfect squares are done correctly. So there is no disadvantage to case 3 either.

But there is one subtle point about case 2. I have found that some compilers don't recognize that the division and remainder are calculated together. For example in the following code

for(divider = 2; divider <= number/divider; divider++)
    if(number % divider == 0)

GCC generates two division instruction even though only one is necessary. One way to fix this is to keep the division and reminder close like this

divider = 2, q = number/divider, r = number%divider
for(; divider <= q; divider++, q = number/divider, r = number%divider)
    if(r == 0)

In this case GCC produces only one division instruction and case1, case 2 and case 3 have the same performance. But this code is a bit less readable than

int cut = sqrt(number);
for(divider = 2; divider <= cut; divider++)
    if(number % divider == 0)

so I think overall case 3 is the best choice at least with the x86 instruction set.

Community
  • 1
  • 1
Z boson
  • 32,619
  • 11
  • 123
  • 226
2

In C, you can take square roots of floating point numbers with the sqrt() family of functions in the header <math.h>.

Taking square roots is usually slower than dividing because the algorithm to take square roots is more complicated than the division algorithm. This is not a property of the C language but of the hardware that executes your program. On modern processors, taking square roots can be just as fast as dividing. This holds, for example, on the Haswell microarchitecture.

However, if the algorithmic improvements are good, the slightly slower speed of a sqrt() call usually doesn't matter.

To only compare up to the square root of number, employ code like this:

#include <math.h>

/* ... */

int root = (int)sqrt((double)number);
for(divider = 2; divider <= root; divider++)
    if(number % divider = 0)
        print("%d can divided by %d\n", number, divider);
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
fuz
  • 88,405
  • 25
  • 200
  • 352
0

This is just my random thought, so please comment and critisize it if it's wrong.

The idea is to precompute all the prime numbers below a certain range and use it as a table.

Looping though the table, check if the prime number is a factor, if it is, then increament the counter for that prime number, if not then increment the index. Terminate when the index reaches the end or the prime number to check exceeds the input.

At end, the result is a table of all the prime factors of the input, and their counts. Then generating all natual factors should be trival, isn't it?

Worst case, the loop needs to go to the end, then it takes 6542 iterations.

Considering the input is [0, 4294967296] this is similar to O(n^3/8).

Here's MATLAB code that implements this method:

if p is generated by p=primes(65536); this method would work for all inputs between [0, 4294967296] (but not tested).

function [ output_non_zero ] = fact2(input, p)
    output_table=zeros(size(p));
    i=1;
    while(i<length(p));
        if(input<1.5)
            break; 
            % break condition: input is divided to 1,
            % all prime factors are found. 
        end
        if(rem(input,p(i))<1)
            % if dividable, increament counter and don't increament index
            % keep checking until not dividable
            output_table(i)=output_table(i)+1;
            input = input/p(i);
        else
            % not dividable, try next
            i=i+1;
        end
    end

    % remove all zeros, should be handled more efficiently
     output_non_zero = [p(output_table~=0);...
                        output_table(output_table~=0)];
     if(input > 1.5)
         % the last and largest prime factor could be larger than 65536
         % hence would skip from the table, add it to the end of output
         % if exists
         output_non_zero = [output_non_zero,[input;1]];
     end
end

test

p=primes(65536);
t = floor(rand()*4294967296);
b = fact2(t, p);
% check if all prime factors adds up and they are all primes
assert((prod(b(1,:).^b(2,:))==t)&&all(isprime(b(1,:))), 'test failed');
user3528438
  • 2,737
  • 2
  • 23
  • 42
  • You may not need all the prime numbers. I suggest you check each for being a factor, as you generate them. Also, each prime may divide more than once, for example `45 = 3 * 3 * 5`. – Weather Vane Feb 18 '16 at 00:55
  • @WeatherVane Please see the test code. Running it with `fact2(45, p)` generates `(3,2), (5,1)`. A difficult case is when the largest prime factor is greater than 65536, which is easy to hadle because there is at most one of them for each input. – user3528438 Feb 18 '16 at 01:02
  • Sorry I don't follow MATLAB, but in C it's still valid to generate the primes as you go, rather than all you might need. – Weather Vane Feb 18 '16 at 01:06
  • @WeatherVane That's true but I don't know how (yet). It'll be useful when the input range is too large for the table to be feasible. – user3528438 Feb 18 '16 at 01:09
  • Unless you used a sieve you would still need a table, as you use it to test for / generate the next prime. – Weather Vane Feb 18 '16 at 01:10
  • 1
    FWIW, the precise asymptotic complexity of this method is O(n^(1/2) / log n) per query (+ the preprocessing step obviously) – Niklas B. Feb 18 '16 at 02:28
  • Not that it really matters if you have a specific input range in mind, of course. – Niklas B. Feb 18 '16 at 02:33