3

I started out with a little program in python but it took ages to run through it so I switched over to c++. I have no earlier experience with this specific language (coded a lot in c# though) and started out in a web editor: https://www.onlinegdb.com/online_c++_compiler.

My C++ code is:

clock_t start, end;

/* Recording the starting clock tick.*/
start = clock();

int R = 0;
int x = 0;

for (R = 6; R <= 10000; R = R + 2) {
    int X_min = ceil(0.5 * sqrt(2) * R);
    int N_pairs = 0;

    for (x = X_min; x < R; x++) {
        float y = sqrt(pow(R, 2) - pow(x, 2));

        if (rint(y) == y) {
            N_pairs = N_pairs + 1;
        }
    }

    if (N_pairs >= 4) {
        //cout << R << ", " << N_pairs;
        //cout << "\n";
    }
}

end = clock();

//Calculating total time taken by the program. 
double time_taken = double(end - start) / double(CLOCKS_PER_SEC);
cout << "Time taken by program is : " << time_taken;
cout << " sec " << endl;

//cout << "1" << "|" << "2" << "|" << "3 \n";
//cout << "4" << "|" << "5" << "|" << "6 \n";
//cout << "7" << "|" << "8" << "|" << "9 \n";

It all worked well, however the web editor seems to have a build-in maximum time boundary so at this point I decided to take it over to Visual Studio.

I copy pasted the code and run it:

  • the web editor took 0.272273 sec to complete the code
  • Visual Studio took 2.446 sec to run it.

I Tried updating VS from the 2017 version to the 2019 one but that had no effect.

Why does it take so much longer for VS to run the code?? and how can I fix it?

Jack Zhai
  • 6,230
  • 1
  • 12
  • 20
  • 6
    Did you enable the release build configuration? – walnut Jan 12 '20 at 12:56
  • 1
    As @walnut said. Running with different optimizations/build flags can play a huge role. See: https://godbolt.org/z/jpgVYx where one is using optimization level 3 and the other one 0. For me it was a factor of 30 difference in time. (Though different compiler than msvc) – koitimes3 Jan 12 '20 at 12:59
  • I did enable the release build configuration. without it takes even longer: 8.701 seconds – user11454816 Jan 12 '20 at 13:10
  • @koitimes3 I clicked your link but I am not completely sure what it is that I am looking at. it seems to me like I can run my code in different compilers but correct me if I am wrong. anyway, I changed one of them to: x86 msvc v19.22 and it thew an error:example.cpp cl : Command line warning D9002 : ignoring unknown option '-std=c++17' cl : Command line warning D9002 : ignoring unknown option '-O3' Compiler returned: 0 – user11454816 Jan 12 '20 at 13:19
  • @user11454816 The link is simply a comparison of how different the result can be depending on optimization even on the same machine. However if I understand it correctly you're comparing how fast it runs on Your machine vs some online service (unknown machine)? It might just be that your machine is slow(er) then. – koitimes3 Jan 12 '20 at 13:27
  • @koitimes3 that's something I hadn't thought about yet. that would be a bummer...it seems, however, that I am not the only person with this sort of issue..ill just keep looking for a solution – user11454816 Jan 12 '20 at 13:42
  • 1
    You **cannot** use `clock` on Windows to compare performance. – n. m. could be an AI Jan 12 '20 at 13:58
  • 2
    Use `sqrt(R*R - x*x)` to avoid pow(), be sure to target x64. – Hans Passant Jan 12 '20 at 14:09
  • @Hans Passant taken out the pow() functions took it down to 1.98 seconds. Targetting x64 got it down to 0.513 seconds. Thx for that – user11454816 Jan 12 '20 at 14:14
  • @n. 'pronouns' m would you care to elaborate your answer. what is it about clock that makes it unusable for comparing performances – user11454816 Jan 12 '20 at 14:15
  • `clock` on Windows is not implemented correctly, Microsoft documentation says that. But this doesn't seem to be your problem. It just looks like VC++ cannot optimise its way out of a wet paper bag. – n. m. could be an AI Jan 12 '20 at 14:42
  • use `y = sqrtf(R*R - x*x)` or `y = sqrt((float)(R*R - x*x))` to avoid the conversion to/from double. And in Windows use QueryPerformanceCounter instead of `clock()`. See [C++ high precision time measurement in Windows](https://stackoverflow.com/q/1825720/995714) – phuclv Jan 12 '20 at 16:27

2 Answers2

3

The main problem is that VC++ doesn't inline the rint(float) call:

    movaps  xmm0, xmm6
    call    rint
    ucomisd xmm0, xmm6

link to godbolt

You can expect good speedup by replacing rint(y) with a "manual" rounding:

Change

        if (rint(y) == y) {

To

        if (int(y+0.5) == y) {

On my machine getting from 0.8 s down to 0.04 s (compiling with /O2 /fp:fast)

Also you need to use N_pairs outside the loop otherwise a (good) compiler can optimize everything away.

rustyx
  • 80,671
  • 25
  • 200
  • 267
1

The first rule of optimization: performance of incorrect program is irrelevant.

It appears that you are looking for integer solutions of x^2 + y^2 = R^2. However, using float data type for intermediate storage produces a lot of false positives. Moving N_pairs out of the loop (to prevent complete removal of that loop, as @rustyx already noted, and to count all pairs) results in 7886 pairs; with double: 5681.

That last number also corresponds to completely integer check, that is quite a bit faster (21 ms on my system). Here is my code:

#include <iostream>
#include <chrono>

int main()
{
    auto t = std::chrono::high_resolution_clock::now();
    int N_pairs = 0;
    double d = 0.5 * sqrt(2);
    for (int R = 6; R <= 10000; R = R + 2) {
        int X_min = ceil(d * R);

        for (int x = X_min; x < R; x++) {
            int y = sqrtf(R * R - x * x);
            if(x*x + y*y == R*R) {
                N_pairs = N_pairs + 1;
            }
        }
    }
    std::cout << "Time taken by program is: " 
        << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - t).count() 
        << " ms" << std::endl;
    std::cout << "N_pairs: " << N_pairs << std::endl; // with float: 7886; with double or int: 5681
return 0;
}
Vlad Feinstein
  • 10,960
  • 1
  • 12
  • 27
  • thx for the reply. the reason why i have N_pairs in the loop is because im interested in the number of pairs per 'R' and not the total number of pairs over all R's. Maybe i should set it to zero every time instead of declaring it over and over again but i did not check that yet. This is the first time that i had to optimize a program. Also, doesnt the fact that you declare variable y as an integer introduce false positives since you round the sqrt right away? I tried changing it to double and while it did not effect the outcome for R up to 10.000 it did for R up to 100.000 – user11454816 Jan 18 '20 at 09:33
  • @user11454816 - my code can't have false-positives because of the integer test to see if a sum of squares is equal R-squared. It *might* have false-negatives because of the rounding that you mentioned; I would address it by also checking if `y+1`-squared will work. – Vlad Feinstein Jan 18 '20 at 20:07