10

I ran across a problem where I needed to calculate the values of very large factorials. I solved this problem in C++ in two different ways, but only want to know if my complexity analysis is accurate.

In either method I am representing very large numbers as vectors where v[0] represents the least significant digit, and the value at the last index represents the most significant digit. Version 1's code can be found in this gist.

Given the above code, it seems multiplyVectorByInteger() is O(log(n*k)) where n is the given integer, and k is the number represented by the vector. My logic is that we'll be doing some number of steps proportional to the length of the resulting number n*k in order to produce a vector representing n*k. The length of n*k is O(log(n*k)) Some of the steps will be carried out in the for loop, others in the following while loop.

In this program to find large factorials, whenever we call multiplyVectorByInteger() we will be passing in an integer n and the vector representation of (n-1)!. This means if we want to find 6!, we pass in the integer 6 and the vector representation of 5!. The function will return the vector representation of 6!. Using the previous information I believe I can say the complexity is O(log(i!)) where i is the passed in integer. In order to find large factorials, we must call this method O(n) times where n is the factorial we are trying to find. Our accumulated logic will look like this:

1!       = 1!
1!*2     = 2!
2!*3     = 3!
3!*4     = 4!
...
(n-1)!*n = n!

Since at each level we're calculating i!, we're consequently performing O(log(i!)) steps at each level. The summation to show this is as follows:

sum1

My logic from jumping from the second summation to the Big-Oh notation is as follows...breaking this out we get the following:

1log(1) + 2log(2) + 3log(3) + ... + nlog(n)

It is obvious we get O(n^2) terms of log(1) + log(2) + ... + log(n). Log rules remind us that log(a) + log(b) = log(ab), which means the log terms in this case collapse to log(n!). Thus we have O(n^2)log(n!).

This would make the overall time complexity of this program O(n^2log(n!)). Is this analysis correct?

Naive version time complexity

To practice complexity analysis I want to take a look at what seems like a less efficient solution. Suppose we change our multiplyVectorByInteger() function such that instead of multiplying a vector representation of k by an integer n in O(log(n!)) time to produce n!, the new multiplyVectorByIntegerNaive() function adds the vector representation of a number together a total of n times.

multiplyVectorByIntegerNaive() exists in this gist. It utilizes a function addVectors() whose complexity O(n) where n size of the larger of the two vectors.

It's clear we're still calling this new multiplication function n times, but we need to see if the complexity has changed. For example given the integer 6 and the vector representation of 5! we add 5! + 5! + 5! + 5! + 5! + 5! to get 6*5! = 6!. If the given integer to our multiplication function is i, it is clear we do i-1 additions. We can enumerate the steps for the previous example call to our naive multiplication function.

5! + 5!
2*5! + 5!
3*5! + 5!
4*5! + 5!
5*5! + 5!

Writing out the full summation should now give:

sum2

It appears the asymptotic complexity of both methods is the same given my calculations are accurate. Is this true?

Dominic Farolino
  • 1,362
  • 1
  • 20
  • 40
  • Your analysis is too pessimistic in the first case. Obviously `\sum_{i=0}^n (log i!) <= n*log n!`, where do you get `n^2`? – n. m. could be an AI Aug 23 '16 at 06:59
  • of course base 10 representation is a huge waste of space and time, but that's "only" a constant factor. – n. m. could be an AI Aug 23 '16 at 07:02
  • @n.m. I assumed he meant "digit in base 2^16" (and 2^16 so that you can multiply two digits in a 32-bit int without overflow). – Martin Bonner supports Monica Aug 23 '16 at 07:09
  • Presumably you are calculating this "many" times (because otherwise performance doesn't matter). If you have some idea of the largest value you are going to be using, you could keep a cache of n! for n=(nmax/16, 2nmax/16 ...). Then you can start calculating the factorial from the cache value below your target. – Martin Bonner supports Monica Aug 23 '16 at 07:13
  • Also, if you are calculating 1000!, and have got to 500!, rather than multiply by 501, you could calculate 501*502*503 in integer arithmetic, and then do the vector step. – Martin Bonner supports Monica Aug 23 '16 at 07:15
  • Storing the digits as integers is indeed a waste. Bytes or nibbles are enough. But of course the computation and carries must be done in integers. –  Aug 23 '16 at 07:22
  • The complexity of a straightforward multiplyVectorByInteger would be O(Log(n).Log(k)), not O(Log(n.k)). But this fix is irrelevant as k=1 here. –  Aug 23 '16 at 07:23
  • @MartinBonner If performance matters, one should use an existing bignum library tuned for performance... – n. m. could be an AI Aug 23 '16 at 07:31
  • Your first evaluation of the complexity is wrong. It should be O(n²Log(n)) or O(n Log(n!)), but not O(n²Log(n!)). –  Aug 23 '16 at 07:50
  • this is a bit off your topic but take a look at [Fast exact bigint factorial](http://stackoverflow.com/q/18317648/2521214). Sometimes it is very hard to estimate complexity especially on bignums as the underlaying operations can have variable complexity depending on the operand size. And accounting only worst case scenarios can sometimes lead to invalid results. To be sure you can count the number of critical operations for different inputs to verify estimated complexity. – Spektre Aug 23 '16 at 08:31
  • factorial on bignums is a special case. For example if you group the multiplications in the right way you can signifacantly lower runtime even if the computation itself is the same. `1*2*3*4*5*6*7*8` vs. `( (1*2) * (3*4) ) * ( (5*6) * (7*8) )` due to lower stack/heap trashing and less empty multiplication cycles ... If you take into account multiplication optimizations then even complexity can change (but the base algorithm complexity is still the same of coarse) – Spektre Aug 23 '16 at 08:35
  • @n.m. regarding your first comment, check out where I wrote the first summation, then said ~my logic from the second summation to big-oh is as follows`...it seemed that that `i` out in front of `log(i)` produced `(n*(n+1))/2` terms, which is `O(n^2)` which is why I have that factor of `n` out there – Dominic Farolino Aug 23 '16 at 14:29
  • @MartinBonner performance is not even close to an issue though I appreciate your comments. This is purely complexity analysis practice. I like the idea of a cache though, could probably make for a good DP algorithm – Dominic Farolino Aug 23 '16 at 14:30
  • @YvesDaoust Regarding your last comment, I'm trying to see how I got that extra factor of `n` there. Obviously my analysis is wrong, but if you take a look at the statement after I write out the first summation ~my logic from jumping from the second summation to the big-Oh is...~ could you tell me what is wrong with that? In short it seems there are `(n*(n+1))/2` terms of `log(i)`. Since a summation of `log(i) = log(n!)` it seems there would be `O(n^2log(n!))` but as you said something is wrong here – Dominic Farolino Aug 23 '16 at 14:35
  • Somewhat off-topic, but there exists a fairly complicated algorithm that can compute `N!` in something like `O(N*log(N)^2)`. It involves prime-factorizing `N`, powering them up and combining them in a specific order. – Mysticial Aug 23 '16 at 15:04

2 Answers2

4

The complexity of the function in the gist you have provided is O(log10n!), where n is the number you pass to the method.

The reasoning for this is evident from the first part of the code:

for (int i = 0; i < numVector.size(); ++i) {
    returnVec[i] = (numVector[i]*n + carry) % 10;
    carry = (numVector[i]*n + carry) / 10;
}

The numVector passed in represents (n - 1)!. i.e. it contains all the digits that make up that number. However length of that number is simply ⌈log10((n-1)!)⌉. You can see this from a simple example:

if (n-1)! is 100, then the length of the numVector will be 3 which is the same as ⌈log10100⌉ = 3.

The same logic applies to the while loop too:

while (carry) {
    returnVec.push_back(carry%10);
    carry /= 10;
}

Since the value of carry will not be larger than n (you can prove this for yourself), then the maximum number of times this loop will run will also not be larger than ⌈log10n!⌉, then the total complexity of the function is equivalent to O(log10n!).

Therefore, to calculate k!, the complexity of your code (including main) will be O(klog10k!)

Naive version

For the naive version, the only thing that has changed is that now the method is manually stepping through the multiplication in the form of addition. This is something the other version skipped by explicitly multiplying each value by n

(numVector[i]*n + carry)

This increases the complexity of the function to O(klog10n!), where k! = n and thus the complexity of the entire code is now O(k2log10k!)

smac89
  • 39,374
  • 15
  • 132
  • 179
  • thanks for the answer. A little confused though, you said `if n-1 is 100, then the length of the `numVector` will be 3...`, but if `n-1` is 100, then `numVector` will contain a representation of `100!`, so the length would be `O(log(100!))`, not `O(log(100))` right? – Dominic Farolino Aug 23 '16 at 14:37
  • Also lets say the multiply function takes in two variables. It takes an integer `n`, and a vector representation of `n-1!` called `k`. In my original question I believe I said the multiplication function takes time proportional to `O(log(n*k))`, which is equivalent to `O(log(n!))`, since `n*k` will always be `n!` in this situation, as that is what we're using this function for. I believe you are saying the exact same thing in your answer no? – Dominic Farolino Aug 23 '16 at 14:45
  • And yes assuming the multiply takes time proportional to the factorial it is trying to find, the summation would be `sum_1^n: log(i!)` which is what I believe we're both saying. To me this is equivalent to `sum_1^n: i*log(i)` or is this wrong? This is the logic I wrote about right when I drew out my first summation the question. Apparently this logic is faulty as I have some extra factor of `n` in there, but cannot figure out where. Could you please point it out? – Dominic Farolino Aug 23 '16 at 14:49
  • @DomFarolino, When I said `if n-1 is 100`, I meant it as an example and the `n` in that example has nothing to do with the actual `n` in the function you have. Come to think of it, I should have just used `n`...oh well. I have updated it to instead say `if (n-1)! is 100`. To answer you second question, yes we are basically saying the same thing here. Lastly, can you please clarify the third comment? I am not familiar with `sum_1^n: i*log(i)` – smac89 Aug 23 '16 at 15:54
  • @DomFarolino The reason why your analysis was yielding wrong results is because you did this `1log(1) + 2log(2) + 3log(3) + ... + nlog(n)`. This essentially means that for each `n`, you compute it's factorial `n` times which is not what you are actually doing. The correct analysis should be `1log(1) + 1log(2) + 1log(3) + ... + 1log(n)`, which evaluates to `nlog(n!)` – smac89 Aug 23 '16 at 16:09
  • Im sorry but I really struggle to see why there is a 1 before each log term. To me you can either have an "i" before each log term or if you have a 1, then the number in each log can't be 1, 2, 3 etc but instead 1!, 2!, 3! Etc. the original sum is [Sum i=1...n] log(i!). This means outside of the sum we get log(1!) + log(2!) ... + log(n!) OR inside the sum we can transform log(i!) => i*log(i). I did this transformation to get [Sum i=1..n] i*log(i) which I believe is a fair transformation. Now unraveling this newly transformed sum yields 1log(1) + 2log(2) ... + nlog(n) right? – Dominic Farolino Aug 23 '16 at 17:21
  • in short I'm saying log(1!) + log(2!) + ... + log(n!) is equivalent to 1log(1) + 2log(2) + ... + nlog(n) which I believe is true because log(k!) = O(klog(k)) – Dominic Farolino Aug 23 '16 at 17:23
  • @DomFarolino, Check the following (using log base 10): `1(log10) + 2(log100) + 3(log1000)` = `6(log(1000000)) = 36`? OR `= (1*1 + 2*2 + 3*3) = 14`?. The math you have done in your question will be equivalent to the first, while the actual answer is the second. Can you please check this? – smac89 Aug 23 '16 at 17:57
  • After reviewing [this](http://math.stackexchange.com/questions/1901399/asymptotics-of-sum-limits-i-1n-i-log-i) question I posted on math.stack... I realize my analysis of the summation is incorrect though I must admit it is kind of confusing. Yes there are only going to be `n` terms, but they will each have `i` in front of them, which gives 1log + 2log + 3log + 4log + ... + n which I assumed was O(n^2)log...interesting – Dominic Farolino Aug 24 '16 at 14:15
1

Multiplying a k-digits number by an integer or adding two k-digits numbers both take time proportional to k.

Hence, in the multiply version, the total workload is

Sum[i=1,n]: log(i!) ~  Sum[i=1,n]: i.log(i) ~ n²log(n)

In the add version,

Sum[i=1,n]: i.log(i!) ~ Sum[i=1,n]: i².log(i!) ~ n³.log(n)

These results can be established by using The Stirling approximation and an integral instead of a summation,

Int x.log(x) dx = x²(log(x)/2 - 1/4)
Int x².log(x) dx = x³(log(x)/3 - 1/9)

As could be expected, there is an extra n factor.

  • Seems my faulty math lies in the statement where I say `Sum[i=1,n]: i.log(i) = O(n^2log(n!)`. Everyone is saying I have an extra factor of `n` in there but I cannot figure out why. Could you take a look at my statement directly after I draw out my first summation and point out what is wrong with that logic please? – Dominic Farolino Aug 23 '16 at 14:52
  • @DomFarolino: "It is obvious we get O(n^2) terms of log(1) + log(2) + ... + log(n)": maybe for you, not for me. –  Aug 23 '16 at 15:09
  • ok that's fair, could you explain why my analysis is wrong though? It seems we get log(1) + log(2) + log(2) + log(3) + log(3) + log(3) and so on. Is this wrong? – Dominic Farolino Aug 23 '16 at 15:12
  • @DomFarolino: how can you conclude that this is O(n²Log(n!)) ?? –  Aug 23 '16 at 15:51