119

I am having difficulty deciding what the time complexity of Euclid's greatest common denominator algorithm is. This algorithm in pseudo-code is:

function gcd(a, b)
    while b ≠ 0
       t := b
       b := a mod b
       a := t
    return a

It seems to depend on a and b. My thinking is that the time complexity is O(a % b). Is that correct? Is there a better way to write that?

Donald T
  • 10,234
  • 17
  • 63
  • 91
  • 15
    See Knuth TAOCP, Volume 2 -- he gives the *extensive* coverage. Just FWIW, a couple of tidbits: it's not proportional to `a%b`. The worst case is when `a` and `b` are consecutive Fibonacci numbers. – Jerry Coffin Oct 20 '10 at 17:10
  • 3
    @JerryCoffin Note: If you want to prove the worst case is indeed Fibonacci numbers in a more formal manner, consider proving the n-th step before termination must be at least as large as gcd times the n-th Fibonacci number with mathematical induction. – Mygod Mar 22 '17 at 15:15

11 Answers11

87

One trick for analyzing the time complexity of Euclid's algorithm is to follow what happens over two iterations:

a', b' := a % b, b % (a % b)

Now a and b will both decrease, instead of only one, which makes the analysis easier. You can divide it into cases:

  • Tiny A: 2a <= b
  • Tiny B: 2b <= a
  • Small A: 2a > b but a < b
  • Small B: 2b > a but b < a
  • Equal: a == b

Now we'll show that every single case decreases the total a+b by at least a quarter:

  • Tiny A: b % (a % b) < a and 2a <= b, so b is decreased by at least half, so a+b decreased by at least 25%
  • Tiny B: a % b < b and 2b <= a, so a is decreased by at least half, so a+b decreased by at least 25%
  • Small A: b will become b-a, which is less than b/2, decreasing a+b by at least 25%.
  • Small B: a will become a-b, which is less than a/2, decreasing a+b by at least 25%.
  • Equal: a+b drops to 0, which is obviously decreasing a+b by at least 25%.

Therefore, by case analysis, every double-step decreases a+b by at least 25%. There's a maximum number of times this can happen before a+b is forced to drop below 1. The total number of steps (S) until we hit 0 must satisfy (4/3)^S <= A+B. Now just work it:

(4/3)^S <= A+B
S <= lg[4/3](A+B)
S is O(lg[4/3](A+B))
S is O(lg(A+B))
S is O(lg(A*B)) //because A*B asymptotically greater than A+B
S is O(lg(A)+lg(B))
//Input size N is lg(A) + lg(B)
S is O(N)

So the number of iterations is linear in the number of input digits. For numbers that fit into cpu registers, it's reasonable to model the iterations as taking constant time and pretend that the total running time of the gcd is linear.

Of course, if you're dealing with big integers, you must account for the fact that the modulus operations within each iteration don't have a constant cost. Roughly speaking, the total asymptotic runtime is going to be n^2 times a polylogarithmic factor. Something like n^2 lg(n) 2^O(log* n). The polylogarithmic factor can be avoided by instead using a binary gcd.

Craig Gidney
  • 17,763
  • 5
  • 68
  • 136
  • Can you explain why "b % (a % b) < a" please ? – Michael Heidelberg Feb 05 '17 at 20:36
  • 4
    @MichaelHeidelberg `x % y` can't be more than `x` and must be less than `y`. So `a % b` is at most `a`, forcing `b % (a%b)` to be below something that is at most `a` and therefore is overall less than `a`. – Craig Gidney Feb 05 '17 at 22:38
  • @Cheersandhth.-Alf You consider a slight difference in preferred terminology to be "seriously wrong"? Of course I used CS terminology; it's a computer science question. Regardless, I clarified the answer to say "number of digits". – Craig Gidney Aug 08 '17 at 20:34
  • @CraigGidney: Thanks for fixing that. Now I recognize the communication problem from many Wikipedia articles written by pure academics. Consider this: the main reason for talking about number of digits, instead of just writing O(log(min(a,b)) as I did in my comment, is to make things simpler to understand for non-mathematical folks. Without that concern just write “log”, etc. So that's the *purpose* of the number of digits, helping those challenged folks. When you *name* this notion “size”, and have the definition elsewhere, and don't talk about “log” at the end, you obscure instead of help. – Cheers and hth. - Alf Aug 09 '17 at 00:49
  • The last paragraph is incorrect. If you sum the relevant telescoping series, you’ll find that the time complexity is just O(n^2), even if you use the schoolbook quadratic-time division algorithm. – Emil Jeřábek Jan 07 '20 at 17:00
  • @CraigGidney For Tiny A, you wrote `` a + b `` will decrease by at least ``25%``. I'm having some difficulty understanding as to why this is true. I can see that, ``a' < a`` and ``b' < a`` so ``a' + b' < 2a``. Again, ``b >= 2a`` so ``a + b >= 3a``. So ``a + b`` will change from ``>=3a`` to ``<2a`` which is a change of ``33%`` or higher. – Robur_131 May 11 '20 at 01:21
  • @Robur_131 A 33% decrease implies an "at least 25%" decrease. What's the problem? – Craig Gidney May 11 '20 at 01:47
  • I'm curious as to why you are not using a tighter bound. Is it because for the sake of calculation? – Robur_131 May 11 '20 at 01:59
  • @Robur_131 It was for the simplicity of the proof. The actual worst case number of rounds is related to the golden ratio instead of 4/3. – Craig Gidney May 11 '20 at 10:54
32

The suitable way to analyze an algorithm is by determining its worst case scenarios. Euclidean GCD's worst case occurs when Fibonacci Pairs are involved. void EGCD(fib[i], fib[i - 1]), where i > 0.

For instance, let's opt for the case where the dividend is 55, and the divisor is 34 (recall that we are still dealing with fibonacci numbers).

enter image description here

As you may notice, this operation costed 8 iterations (or recursive calls).

Let's try larger Fibonacci numbers, namely 121393 and 75025. We can notice here as well that it took 24 iterations (or recursive calls).

enter image description here

You can also notice that each iterations yields a Fibonacci number. That's why we have so many operations. We can't obtain similar results only with Fibonacci numbers indeed.

Hence, the time complexity is going to be represented by small Oh (upper bound), this time. The lower bound is intuitively Omega(1): case of 500 divided by 2, for instance.

Let's solve the recurrence relation:

enter image description here

We may say then that Euclidean GCD can make log(xy) operation at most.

19

There's a great look at this on the wikipedia article.

It even has a nice plot of complexity for value pairs.

It is not O(a%b).

It is known (see article) that it will never take more steps than five times the number of digits in the smaller number. So the max number of steps grows as the number of digits (ln b). The cost of each step also grows as the number of digits, so the complexity is bound by O(ln^2 b) where b is the smaller number. That's an upper limit, and the actual time is usually less.

tech_boy
  • 195
  • 3
  • 13
JoshD
  • 12,490
  • 3
  • 42
  • 53
  • @IVlad: Number of digits. I've clarified the answer, thank you. – JoshD Oct 20 '10 at 17:16
  • For OP's algorithm, using (big integer) divisions (and not substractions) it is in fact something more like O(n^2 log^2n). – Alexandre C. Oct 20 '10 at 17:27
  • @Alexandre C.: Keep in mind `n = ln b`. What is the regular complexity of modulus for big int? Is it O(log n log^2 log n) – JoshD Oct 20 '10 at 17:34
  • @JoshD: it is something like that, I think I missed a log n term, the final complexity (for the algorithm with divisions) is O(n^2 log^2 n log n) in this case. – Alexandre C. Oct 20 '10 at 18:49
  • @JoshD: I missed something: typical complexity for division with remainder for bigints is O(n log^2 n log n) or O(n log^2n) or something like that (I don't remember exactly), but definitely at least linear in the number of digits. – Alexandre C. Oct 20 '10 at 22:08
16

See here.

In particular this part:

Lamé showed that the number of steps needed to arrive at the greatest common divisor for two numbers less than n is

alt text

So O(log min(a, b)) is a good upper bound.

Community
  • 1
  • 1
IVlad
  • 43,099
  • 13
  • 111
  • 179
  • 4
    That is true for the number of steps, but it doesn't account for the complexity of each step itself, which scales with the number of digits (ln n). – JoshD Oct 20 '10 at 17:18
9

Here's intuitive understanding of runtime complexity of Euclid's algorithm. The formal proofs are covered in various texts such as Introduction to Algorithms and TAOCP Vol 2.

First think about what if we tried to take gcd of two Fibonacci numbers F(k+1) and F(k). You might quickly observe that Euclid's algorithm iterates on to F(k) and F(k-1). That is, with each iteration we move down one number in Fibonacci series. As Fibonacci numbers are O(Phi ^ k) where Phi is golden ratio, we can see that runtime of GCD was O(log n) where n=max(a, b) and log has base of Phi. Next, we can prove that this would be the worst case by observing that Fibonacci numbers consistently produces pairs where the remainders remains large enough in each iteration and never become zero until you have arrived at the start of the series.

We can make O(log n) where n=max(a, b) bound even more tighter. Assume that b >= a so we can write bound at O(log b). First, observe that GCD(ka, kb) = GCD(a, b). As biggest values of k is gcd(a,c), we can replace b with b/gcd(a,b) in our runtime leading to more tighter bound of O(log b/gcd(a,b)).

Shital Shah
  • 63,284
  • 17
  • 238
  • 185
  • Can you give a formal proof that Fibonacci nos produce the worst case for Euclids algo ? – Akash Jul 28 '17 at 18:01
6

Here is the analysis in the book Data Structures and Algorithm Analysis in C by Mark Allen Weiss (second edition, 2.4.4):

Euclid's algorithm works by continually computing remainders until 0 is reached. The last nonzero remainder is the answer.

Here is the code:

unsigned int Gcd(unsigned int M, unsigned int N)
{

    unsigned int Rem;
    while (N > 0) {
        Rem = M % N;
        M = N;
        N = Rem;
    }
    Return M;
}

Here is a THEOREM that we are going to use:

If M > N, then M mod N < M/2.

PROOF:

There are two cases. If N <= M/2, then since the remainder is smaller than N, the theorem is true for this case. The other case is N > M/2. But then N goes into M once with a remainder M - N < M/2, proving the theorem.

So, we can make the following inference:

Variables    M      N      Rem

initial      M      N      M%N

1 iteration  N     M%N    N%(M%N)

2 iterations M%N  N%(M%N) (M%N)%(N%(M%N)) < (M%N)/2

So, after two iterations, the remainder is at most half of its original value. This would show that the number of iterations is at most 2logN = O(logN).

Note that, the algorithm computes Gcd(M,N), assuming M >= N.(If N > M, the first iteration of the loop swaps them.)

cd-00
  • 585
  • 5
  • 10
5

Worst case will arise when both n and m are consecutive Fibonacci numbers.

gcd(Fn,Fn−1)=gcd(Fn−1,Fn−2)=⋯=gcd(F1,F0)=1 and nth Fibonacci number is 1.618^n, where 1.618 is the Golden ratio.

So, to find gcd(n,m), number of recursive calls will be Θ(logn).

Arnav Attri
  • 51
  • 1
  • 4
4

The worst case of Euclid Algorithm is when the remainders are the biggest possible at each step, ie. for two consecutive terms of the Fibonacci sequence.

When n and m are the number of digits of a and b, assuming n >= m, the algorithm uses O(m) divisions.

Note that complexities are always given in terms of the sizes of inputs, in this case the number of digits.

Alexandre C.
  • 55,948
  • 11
  • 128
  • 197
2

Gabriel Lame's Theorem bounds the number of steps by log(1/sqrt(5)*(a+1/2))-2, where the base of the log is (1+sqrt(5))/2. This is for the the worst case scenerio for the algorithm and it occurs when the inputs are consecutive Fibanocci numbers.

A slightly more liberal bound is: log a, where the base of the log is (sqrt(2)) is implied by Koblitz.

For cryptographic purposes we usually consider the bitwise complexity of the algorithms, taking into account that the bit size is given approximately by k=loga.

Here is a detailed analysis of the bitwise complexity of Euclid Algorith:

Although in most references the bitwise complexity of Euclid Algorithm is given by O(loga)^3 there exists a tighter bound which is O(loga)^2.

Consider; r0=a, r1=b, r0=q1.r1+r2 . . . ,ri-1=qi.ri+ri+1, . . . ,rm-2=qm-1.rm-1+rm rm-1=qm.rm

observe that: a=r0>=b=r1>r2>r3...>rm-1>rm>0 ..........(1)

and rm is the greatest common divisor of a and b.

By a Claim in Koblitz's book( A course in number Theory and Cryptography) is can be proven that: ri+1<(ri-1)/2 .................(2)

Again in Koblitz the number of bit operations required to divide a k-bit positive integer by an l-bit positive integer (assuming k>=l) is given as: (k-l+1).l ...................(3)

By (1) and (2) the number of divisons is O(loga) and so by (3) the total complexity is O(loga)^3.

Now this may be reduced to O(loga)^2 by a remark in Koblitz.

consider ki= logri +1

by (1) and (2) we have: ki+1<=ki for i=0,1,...,m-2,m-1 and ki+2<=(ki)-1 for i=0,1,...,m-2

and by (3) the total cost of the m divisons is bounded by: SUM [(ki-1)-((ki)-1))]*ki for i=0,1,2,..,m

rearranging this: SUM [(ki-1)-((ki)-1))]*ki<=4*k0^2

So the bitwise complexity of Euclid's Algorithm is O(loga)^2.

esra
  • 201
  • 2
  • 8
1

For the iterative algorithm, however, we have:

int iterativeEGCD(long long n, long long m) {
    long long a;
    int numberOfIterations = 0;
    while ( n != 0 ) {
         a = m;
         m = n;
         n = a % n;
        numberOfIterations ++;
    }
    printf("\nIterative GCD iterated %d times.", numberOfIterations);
    return m;
}

With Fibonacci pairs, there is no difference between iterativeEGCD() and iterativeEGCDForWorstCase() where the latter looks like the following:

int iterativeEGCDForWorstCase(long long n, long long m) {
    long long a;
    int numberOfIterations = 0;
    while ( n != 0 ) {
         a = m;
         m = n;
         n = a - n;
        numberOfIterations ++;
    }
    printf("\nIterative GCD iterated %d times.", numberOfIterations);
    return m;
}

Yes, with Fibonacci Pairs, n = a % n and n = a - n, it is exactly the same thing.

We also know that, in an earlier response for the same question, there is a prevailing decreasing factor: factor = m / (n % m).

Therefore, to shape the iterative version of the Euclidean GCD in a defined form, we may depict as a "simulator" like this:

void iterativeGCDSimulator(long long x, long long y) {
    long long i;
    double factor = x / (double)(x % y);
    int numberOfIterations = 0;
    for ( i = x * y ; i >= 1 ; i = i / factor) {
        numberOfIterations ++;
    }
    printf("\nIterative GCD Simulator iterated %d times.", numberOfIterations);
}

Based on the work (last slide) of Dr. Jauhar Ali, the loop above is logarithmic.

enter image description here

Yes, small Oh because the simulator tells the number of iterations at most. Non Fibonacci pairs would take a lesser number of iterations than Fibonacci, when probed on Euclidean GCD.

  • As this study was conducted using C language, precision issues might yield erroneous/imprecise values. That's why **long long** was used, to fit better the floating point variable named **factor**. The compiler utilized is MinGW 2.95. – Mohamed Ennahdi El Idrissi Mar 01 '14 at 22:06
1

At every step, there are two cases

b >= a / 2, then a, b = b, a % b will make b at most half of its previous value

b < a / 2, then a, b = b, a % b will make a at most half of its previous value, since b is less than a / 2

So at every step, the algorithm will reduce at least one number to at least half less.

In at most O(log a)+O(log b) step, this will be reduced to the simple cases. Which yield an O(log n) algorithm, where n is the upper limit of a and b.

I have found it here

cs guy
  • 926
  • 2
  • 13
  • 33
  • for the first case b>=a/2, i have a counterexample... let me know if i misunderstood it. let a = 20, b = 12. then b>=a/2 (12 >= 20/2=10), but when you do euclidean, a, b = b, a%b , (a0,b0)=(20,12) becomes (a1,b1)=(12,8). new b1 > b0/2. (8 > 12/2=6).. – Minjeong Choi Nov 02 '21 at 02:55