Why is there a difference in run time when datatype used is 'long long int' as compared to 'int'?

Question

I was writing a program which involved input up to the range of 1 million, when I was using datatype 'int' to deal with my values the run time was very very high, the program never executed itself completely so I was not able to note down the run time. code before;

#include<stdio.h>
int main()
{       
    int n,m,i,maxt=0,maxn;

    for(n=2;n<=1000000;n++){
        m=n;
        i=0;
        for(i=0;m!=1;i++){
            if(m%2==0)
                m=m/2;
            else
                m=(3*m+1);
        }

        if(i>maxt){
            maxt=i;
            maxn=n;
        }
    }

    printf("%d%d",maxn,maxt);

    return 0;
}

But then while juggling with the code I changed the datatype from 'int' to 'long long int' surprisingly the run time decreased drastically(milli seconds), can anyone explain what may be the reason behind this ? code after;

#include<stdio.h>
int main()
{       
    long long int n,m,i,maxt=0,maxn;

    for(n=2;n<=1000000;n++){
        m=n;
        i=0;
        for(i=0;m!=1;i++){
            if(m%2==0)
                m=m/2;
            else
                m=(3*m+1);
        }

        if(i>maxt){
            maxt=i;
            maxn=n;
        }
    }

    printf("%lld%lld",maxn,maxt);

    return 0;
}

Can you provide exact numbers for the run times? Please include what version of which compiler you're using, and any compiler flags you might be using. — Apples, Nov 01 '19 at 16:47
@Apples well in first case the run time was so high that even after waiting for more than 5mins the output was still not computed. And in second case it took milli seconds. I am using an inbuilt ubuntu compiler. — Abhishek Guru, Nov 01 '19 at 16:51
Can you highlight where the two versions differ? They look identical to me, and use `long long int` in both cases. — Clifford, Nov 01 '19 at 16:52
Are you compiling both of the code samples with the same compiler flags? A difference that large is usually due to missed optimizations. Are you compiling with any optimization flags like `-O2`? — Apples, Nov 01 '19 at 16:54
Instead of the `long long` nonsense, why not use a specific type like `int64_t`? — tadman, Nov 01 '19 at 16:56
Unoptimized code is always going to be slow. You're basically making a debug build. — tadman, Nov 01 '19 at 16:56
@Clifford okk, I'll pay heed to it next time but right now differences in run time is still a mystery... — Abhishek Guru, Nov 01 '19 at 16:59
I suspect you are using the GCC compiler, are you running it on the command line? — Apples, Nov 01 '19 at 17:01
@AbhishekGuru I tried your code and got the same result. That's what I meant by confirmed. Also, indent your code properly next time. I've fixed it for you now. — klutt, Nov 01 '19 at 17:02
@tadman the difference between optimized and debug can be large, but is unlikely to be minutes vs. milliseconds. I'd guess the first version has an infinite loop. — Mark Ransom, Nov 01 '19 at 17:02
@MarkRansom There's undoubtedly algorithmic issues here, but a speed-up of 100x or more is not uncommon for numerically intense code. — tadman, Nov 01 '19 at 17:03
@MarkRansom no the loop runs perfectly, as entire code is same in both the case and second case works fine... — Abhishek Guru, Nov 01 '19 at 17:05
@Clifford I think there is some misunderstanding go through the question once again.. — Abhishek Guru, Nov 01 '19 at 17:08

JohanC · Accepted Answer · 2019-11-01T20:30:43.157

You are calculating the Collatz conjecture. For some numbers n as input, the m's can get very large. If m gets larger than 2³¹, with a normal int, you get a negative number. To be more explicit: when m >= 2³¹ and m < 2³² a signed 32-bit value will be interpreted as a negative number: the computer can not see such difference when only working with 32-bit.

Negative numbers for m get caught in an endless loop never reaching the m == 1 end condition. Therefore, an int type of 64 bits is needed. On the wikipedia page, 3 different loops between negative numbers are shown, for example m=-1 becomes m=-2 which again becomes m=-1 in a never ending loop.

The first time m gets larger than 2³¹ is for n=113383 where m reaches 2482111348.

To further clarify: the problem is not with n but with m in following loop.

    m=n;
    for(i=0;m!=1;i++){
        if(m%2==0)
            m=m/2;
        else
            m=(3*m+1);
    }

For each n, this loop gets executed many times. m starts with getting the value of n, for example 113383. In this case, after 120 steps, m reaches 2482111348, which is so big it doesn't fit anymore in a 32-bit signed integer. On most modern processors, 2482111348 gets the same representation as -1812855948. The loop now continues further with negative values. After a while, it gets in an endless loop always repeating the same 18 numbers -17, -50, -25, ..., -34, -17. And never reaching the condition m==1 needed to stop the for-loop.

klutt · Answer 2 · 2019-11-01T17:31:32.900

0

Here is a small modification to your code that works with gcc

#include<stdio.h>
#include<stdlib.h>

void overflow()
{
    fprintf(stderr, "Overflow\n");
    exit(1);
}

int main()
{       
    int n,m,i,maxt=0,maxn;

    for(n=2;n<=1000000;n++){
        m=n;
        i=0;
        for(i=0;m!=1;i++){
            if(m%2==0)
                m=m/2;
            else {
                int m_prev = m;

                // Replacing m = (3*m+1) with operations that checks for 
                // overflow
                if(__builtin_mul_overflow(m,3,&m)) {
                    printf("%d\n", m_prev);
                    printf("%d\n", INT_MAX);
                    overflow();
                }
                if(__builtin_add_overflow(m,1,&m))
                    overflow();
            }
        }
        if(i>maxt){
            maxt=i;
            maxn=n;
        }
    }

    printf("%lld%lld",maxn,maxt);

    return 0;
}

If an overflow happens, it will print "overflow" and exit. And that's what's happening. What's happening is that the result of 3*m+1 gets too large for an int to hold, which overflows it.

You can read about those gcc functions here: https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html

edited Nov 01 '19 at 17:31

answered Nov 01 '19 at 17:14

klutt

30,332
17
55
95

but why it faced overflow cause int(4byte) can hold those values max of which will be 1million... – Abhishek Guru Nov 01 '19 at 17:26
@AbhishekGuru: The problem isn't `max`, the problem is `m` - you're trying to store values into `m` that are too large. The behavior on signed integer overflow is undefined, but the upshot is that the value stored in `m` is no longer useful or predictable, and your algorithm is going off the rails because of it. – John Bode Nov 01 '19 at 17:29
@AbhishekGuru : When the result of an operation is not representable by the data type to which the result is assigned. For example if you have an `int8_t` type with value 127, and add 1, the result 128 is not representable and the value assigned will be -128. In your case overflow and the erroneous result means that `m != 1` remains true until the error happens to end up there. – Clifford Nov 01 '19 at 17:29
@AbhishekGuru : Why are you still asserting that when JohanC's edited answer clearly demonstrates where the overflow occurs. – Clifford Nov 01 '19 at 17:31
@JohnBode brother m is quite similar to n both are int so why will I face this problem ? – Abhishek Guru Nov 01 '19 at 17:32
@AbhishekGuru I added some code. Run it and you can clearly see that if you multiply by 3, you will get a larger result than INT_MAX – klutt Nov 01 '19 at 17:32
1

@AbhishekGuru: Because you are computing the value of `m` differently from the value of `n`, and that difference means that the value in `m` grows more quickly than the one in `n`, and at some point that value exceeds what a regular signed `int` can represent. – John Bode Nov 01 '19 at 17:35

Why is there a difference in run time when datatype used is 'long long int' as compared to 'int'?

2 Answers2