0
#include <stdio.h>
#include <stdlib.h>

int factorial(int i) {
    if(i == 1) {
        return 1;
    }
    else {
        return i*factorial(i - 1);
    }
}

int combination(int l, int m) {
    return factorial(l)/(factorial(l-m)*factorial(m));
}

int main() {
    int n,r;
    printf("Input taken in form of nCr\n");
    printf("Enter n: ");
    scanf("%d", &n);
    printf("Enter r: ");
    scanf("%d", &r);
    int y = combination(n, r);
    printf("Result: %d", y);

    return 0;
}

Tried to make a simple code for calculating the combination function in maths. It worked for small values and basically works till n = 12, and gives wrong values from n = 13 and onwards. Also for n = 15 and r = 2, it returns the result -4. And it gives the error

segmentation fault (core dumped)

for n = 40 and r = 20. I would like to know how to solve this problem and why exactly is this happening.

  • 5
    What is the value of 13! and what is the max value of an int? – Tyler Jan 14 '20 at 17:29
  • 1
    If `INT_MAX` on your platform is less than `6227020800`, then you're overflowing signed `int` (and I'll all-but-guarantee that's exactly what's happening). Also, `long long int y = combination(n, r);`, the `long long int` type of `y` here doesn't mean diddly when `combination` returns `int` in the first place. As a bonus, you're then sending a `long long int` to a `printf` that expects `int` via `%d`, so that's off the rails as well. – WhozCraig Jan 14 '20 at 17:32
  • @WhozCraig I added the `long long int` just for test and it didn't change anything. And the value of `INT_MAX` on my device is `2147483647`, how do I go about changing my code according to this? – UnwantedTachyon Jan 14 '20 at 17:46
  • I managed to get a segmentation fault with this code by trying for factorial(0). It was a stack overflow. – David G. Jan 14 '20 at 18:29
  • This is not a job for C. Try Python (big integers built in). – Lee Daniel Crocker Jan 14 '20 at 19:03

4 Answers4

3

The value of 13! is 6227020800 which is too large to fit into an 32 bit integer. By attempting to calculate this factorial or larger results in overflowing a 32 bit int. Signed integer overflow invokes undefined behavior.

In some cases, this undefined behavior manifests as outputting the wrong value, while in others it manifests as a crash. The cases where it crashes the factorial function is most likely passed a value less than 1, meaning that the recursive calls will attempt to go all the way down to INT_MIN but fills up the stack before that can happen.

Even changing to long long isn't enough to fix this, as the intermediate results will overflow that. So how do you fix this? If you were calculating these values by hand you wouldn't multiply out all of the numbers together then divide two huge numbers. You'd write out the factors and cancel out terms from the top and bottom of the equation. For example, suppose you wanted to calculate 12C7. You would write it out like this:

12 * 11 * 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1
------------------------------------------------
( 5 * 4 * 3 * 2 * 1 ) * (7 * 6 * 5 * 4 * 3 * 2 * 1)

Then cancel out 7! from the top and bottom:

12 * 11 * 10 * 9 * 8 
---------------------
 5 * 4 * 3 * 2       

Then cancel out other terms:

12 * 11 * 10 * 9 * 8     12 * 11 * 2 * 9 * 8    12 * 11 * 2 * 9 
---------------------  = -------------------- = --------------- =  4 * 11 * 2 * 9
 5 * 4 * 3 * 2              4 * 3 * 2                 3

Then multiply what's left:

4 * 11 * 2 * 9 = 792

Now do this in code. :) Be sure to change all of your datatypes to long long, as the result of 40C20 is still a bit larger than what a 32-bit int can hold. This type is guaranteed to be at least 64 bits.

dbush
  • 205,898
  • 23
  • 218
  • 273
2

This is an overflow problem here. You result is above the max int value.

13! = 6227020800

Wich is more than INT_MAX (2147483647). If you want to handle larger numbers you should either use other variables types (for example unsigned long long), or handle the overflow in your function to avoid memory crashes.

Here is a topic that could be interesting about overflow checking in c here.

Also for n = 15 and r = 2, it returns the result -4

When a variable overflowed, it can underflow and overflow in cycle. This is why you are getting negative values. I'm not sure but I think this is related. If somebody can validate this it would be great.

Lucas Gras
  • 961
  • 1
  • 7
  • 22
1

When I run your program in a debugger with n = 40 and r = 20 on a 32-bit binary compiled with Microsoft Visual Studio, then I don't get a segmentation fault, but I get a division by zero error in the following line:

return factorial(l)/(factorial(l-m)*factorial(m));

factorial(l-m) and factorial(m) both evaluate to factorial(20), which is 2,192,834,560.

Assuming that sizeof(int) == 4 (32-bit), this number cannot be represented by a signed int. Therefore, the int overflows, which, according to the official C standard, causes undefined behavior.

However, even if the behavior is undefined, I can reasonably speculate that the following happens:

Due to the overflow, the number 2,192,834,560 will become -2,102,132,736. This is because the second number corresponds to the first number in Two's complement binary representation.

Since this number is multiplied with itself in your code (assuming n = 40 and r = 20), then the result of the multiplication will be 4,418,962,039,762,845,696. This number certainly does not fit into a signed int, so that an overflow occurs again.

The hexadecimal representation of this number is 0x3D534E9000000000.

Since this large number does not fit into a 32-bit integer, all the excess bits are stripped off, which is equivalent to subjecting the result to modulo UINT_MAX + 1 (4,294,967,296). The result of this modulo operation is 0.

Therefore, the expression

factorial(l-m)*factorial(m)

evaluates to 0.

This means that the line

return factorial(l)/(factorial(l-m)*factorial(m));

will cause a division by zero exception.

One way of solving the problem of handling large numbers is to use floating point numbers instead of integers. These can handle very large numbers without overflowing, but you may lose precision. If you use double instead of float, you will not so easily lose precision and, even if you do, the precision loss will be smaller.

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
1

I guess there are 2 effects interacting:

  1. Your integers overflow, that is the value of factorial(i) will become negative for sufficiently big i leading to
  2. Your recursion (having factorial call itself) consumes all your stack space.

Try to change the condition in factorial from if(i == 1):

int factorial(int i) {
   if(1 == i) {
      return 1;
   } else if(1 > i) {
      return -1;
   }

   return i * factorial(i - 1);

}

This should have you get rid of the SEGFAULT.

For the integer overflow, the only possible solution would be to not rely on C integer arithmethic but using some bignum library (or write the code on your own).

Some explanation for what is probably going on:

As @WhozCraig pointed out, integers can only keep a range of numbers up to INT_MAX. However, factorial(i) just explodes even for relatively small numbers. C however does not capture this exception and your integers will silently overflow to negative numbers. This means at some point you start feeding factorial with negative numbers.

However, for each function call, some data has to be pushed onto the stack (usually the return address and local variables, possibly including the function arguments). This memory will be released only after the function returns. This means, if you call factorial(40), if everything works integer wise, you will eat up 40 times the amount of memory for 1 call to factorial.

Since your factorial does not handle negative numbers correctly, it will end up calling itself endlessly, overflowing from time to time, until the condition i == 1 at some point is randomly hit. Ostensibly in most cases, this does not happen before your stack is exhausted.

Michael Beer
  • 853
  • 5
  • 12