-2

I am new to competitive coding (C++) and was Practising on hackerEarth problems based on Implementation . I submitted a problem to find summations of an array in 3 diff. parts . Here's My code it took 1.010039s to execute:

int main()
{
    long unsigned int n, a[100000], s1 = 0, s2 = 0, s3 = 0;
    cin >> n;

    for (int i = 1; i <= n; i++) {

        cin >> a[i];

        if (i % 3 == 0)
            s3 = s3 + a[i];

        else if ((i - 2) % 3 == 0)
            s2 = s2 + a[i];

        else
            s1 = s1 + a[i];
    }

    cout << s1 << " " << s2 << " " << s3;

    return 0;
}

Here's The Least Time code :

int main()
{
    int n;
    long unsigned int a[100000];
    long unsigned int s1 = 0, s2 = 0, s3 = 0;
    int i;
    cin >> n;
    for (i = 0; i < n; i++) {
        cin >> a[i];
    }
    for (i = 0; i < n; i = i + 3) {
        s1 = s1 + a[i];
    }
    for (i = 1; i < n; i = i + 3) {
        s2 = s2 + a[i];
    }
    for (i = 2; i < n; i = i + 3) {
        s3 = s3 + a[i];
    }
    cout << s1 << " " << s2 << " " << s3 << endl;
    return 0;
}

we can see that ,there are much more for loops in the second case , and still it is faster why ?

drescherjm
  • 10,365
  • 5
  • 44
  • 64
  • 1
    Please refrain asking questions about online code judge engines here. It's very unlikely that anyone could tell you where you failed from their test cases, as these aren't disclosed usually. Even if what you tested was running at your local environment, you may have missed to test some edge cases which are applied in the online challenge. Be creative and try to find them. Additionally there's probably no value for such questions in the long term, other than cheating the online contest, and nothing is learned. – πάντα ῥεῖ Oct 06 '16 at 20:36
  • Because of cache locality. – SLaks Oct 06 '16 at 20:36
  • 4
    It could also be because all your `if`s confuse the branch predictor. Have a look at http://stackoverflow.com/questions/11227809/why-is-it-faster-to-process-a-sorted-array-than-an-unsorted-array. – mindriot Oct 06 '16 at 20:41
  • 1
    Each of those loops is only `1/3` as long as your loop (`i = i + 3`) + easier to guess conditions verses your `ifs`. – Galik Oct 06 '16 at 20:57
  • @SLaks in fact it is the opposite. OP's code is more cache-friendly. But it's branching, thus killing performance. The least time code caches in and out the memory three times, but is non-branching, thus way faster. – Jeffrey Nov 22 '18 at 15:24
  • Can we have a link to the problem? – papagaga Nov 22 '18 at 15:32

2 Answers2

0

it doesn't matter how many loops you write as long as they don't overlap.
the real factor which made the second algorithm faster is that in each time index i is moving 3 times not like the first algorithms (step by step) you may also consider the ifs that will make a bit of change in huge data entry as you know in competitive programming every millisecond counts.
for that i would recommend you to study Complexity that will help you with your competitive passion as it did for me.

0

The least time code is non-branching (except for the loop condition). That mean the compiler knows exactly what will happen, which instruction will be executed. It also means the CPU knows what instruction are coming and what will be executed. That allows the best instructions to be generated by the compiler and allows the CPU to fetch the right memory and prepare itself for what's coming. As a result, the whole CPU is fully utilized most of the time.

In your case, the result of i%3 can't be predicted, so the CPU stalls as it didn't always plan correctly which branch of the ifs it would take. In fact, your code is better for cache locality as it touches contiguous memory just once. But you get a much bigger performance hit from the branching part.

As a general rule: the least ifs in your code, the faster it can be.

Jeffrey
  • 11,063
  • 1
  • 21
  • 42