I am new to C++ and I want to know how to optimise this code further

Question

I am new to C++ and I have to optimise this code so that it gets executed within 1.5 secs for Input values upto 10^6. But this code takes 3.52 seconds for input 10^6 to get executed.

I had tried a lot and came up with this code.

#include <iostream>
#include <iostream>
#include <vector>
#include <bits/stdc++.h>
#include <iterator>
#include <utility>
#include <boost/multiprecision/cpp_int.hpp>

using boost::multiprecision::cpp_int;
using namespace std;

cpp_int gcd(cpp_int a, cpp_int b)
{
    // Everything divides 0
    if (a == 0)
        return b;
    if (b == 0)
        return a;

    // base case
    if (a == b)
        return a;

    // a is greater
    if (a > b)
        return gcd(a - b, b);
    return gcd(a, b - a);
}

int main()
{

    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    cpp_int t, ans;
    std::cin >> t;
    while (t-- > 0) {
        cpp_int k;
        std::cin >> k;
        cpp_int limit = (2 * k) + 1;
        ans = 0;

        vector<cpp_int> g1;

        for (int i = 1; i <= limit; ++i)
            g1.push_back(k + (i * i));

        for (int i = 0; i <= (2 * k) - 1; ++i) {
            ans += gcd(g1[i], g1[i + 1]);
        }

        std::cout << ans << std::endl;
    }
    return 0;
}

Constraints --> 1 ≤ t ≤ 10^6 & 1 ≤ k ≤ 10^6

For each test case, it outputs a single line containing the sum S for the given k. — Jerry02, May 10 '21 at 15:16
If the program works, but you want it to work better, consider asking [at Code Review](https://codereview.stackexchange.com/help/asking). I've linked to the asking help pages so you can review your question to make sure it is a good fit before posting. That said, if your performance goals are far from what you've attained, there's a strong case to be made for the code NOT working well enough for Code Review. — user4581301, May 10 '21 at 15:16
replace `std::endl` with `'\n'` and you are done. Remember that `std::endl` does buffer flushing slowing down optimizations introduced by buffers. — Marek R, May 10 '21 at 15:17
"I am new to" and "i want to optimize" don't combine. First, learn the language well. Then get some experience. Finally understand when and why optimizing is necessary and only then optimize something. — JHBonarius, May 10 '21 at 15:19
The biggest challenge when it comes to optimizations is to resist the urge to do it. You should *profile* your code (build with optimizations enabled) to find out the top one or two bottlenecks, and concentrate your effort on those only. — Some programmer dude, May 10 '21 at 15:19
Use `std::vector<>::reserve()` to preallocate memory and perform dynamic allocation once. — MatG, May 10 '21 at 15:20
Please break the habit of [using `#include `](https://stackoverflow.com/q/31816095/10077). I also recommend [not using `using namespace std;`](https://stackoverflow.com/q/1452721/10077). — Fred Larson, May 10 '21 at 15:21
if you only need the sum, you need not store the values in the vector — 463035818_is_not_an_ai, May 10 '21 at 15:23
Typically this sort of question requires you to know or discover some conceptual trick in order to solve it efficiently. A brute force approach will not pass. — user4581301, May 10 '21 at 15:23
On the other hand, you are using type `int` and letting it overflow... (`i*i`) — Marc Glisse, May 10 '21 at 18:56
Your gcd function ends up calling the gcd function defined by boost, either that's cheating or you should just use the one from boost directly. — Marc Glisse, May 10 '21 at 19:05

score 2 · Answer 1 · answered May 10 '21 at 15:36

std::endl is better to be replaced with '\n'. std::endl flushes output stream and it is not a cheap operation, because it ignores all the advantages of buffered output.
g1.reserve(MAX_G1_SIZE) after g1 initialization. It looks like your g1 vector is big enough at the end of the program. a vector is a dynamic array, which means that as soon as its capacity becomes insufficient, a larger piece of memory is allocated in the heap and existing elements are copied there. This operation has linear complexity but is rarely performed. The solution is to first tell the vector what size to reserve with the command g1.reserve(MAX_G1_SIZE) (MAX_G1_SIZE is maximal g1 size after program evaluation. For example, we can put MAX_G1_SIZE = t).
Make gcd not recursive. This usually doesn't speed things up much, but it may help.

gcd being recursive won't be much of an issue thanks to tail call optimisation. — Aconcagua, May 10 '21 at 16:01

Aconcagua · Answer 2 · 2021-05-10T18:29:31.550

You do not reserve memory in advance, so you suffer from continuous re-allocations, especially as you have a locally created std::vector.

Keeping the vector outside of the loop allows to re-use previously allocated memory (you'd simply clear the vector). However it is pretty simple to get along without an additional vector at all:

cpp_int n = k + 1;
for (int i = 2; i <= limit; ++i)
{
    cpp_int m = k + i * i; // if multiplication is costly: = m + i << 1 - 1
    ans += gcd(n, m);
    n = m;
}

Not sure how complex modulo operator is with boost's multiprecision library, but I'd assume you get further improvement by using that one instead of subtraction:

return b == 0 ? a : gcd(b, a % b);

Modulo reaches quicker the gcd, additionally you spare some if's (if a is 0 or smaller than b on first call, you get one additional recursion that only swaps the two values).

I am new to C++ and I want to know how to optimise this code further

2 Answers2