3

I've implemented a simple program to measure sin calculation performance by gnu scientific library and libc. Here is a source code :

#include <iostream>
#include <vector>
#include <math.h>
#include <chrono>
#include <list>
#include "gsl/gsl_sf_trig.h"

int main()
{
    std::uint32_t numStepsToCalculate = 1000000;
    const double startX = 0.0;
    const double finishX = M_PI*2;
    const double stepX  = (finishX - startX)/numStepsToCalculate;

    double currentX = startX;
    std::list<double> res;

    auto startT = std::chrono::steady_clock::now();
    while ( currentX <= finishX ) {
        res.push_back( sin ( currentX ) );
        currentX += stepX;
    }
    auto endT = std::chrono::steady_clock::now();

    auto diffT = endT - startT;
    std::cout << "STD : " << std::chrono::duration <double, std::milli> (diffT).count() << " ms" << std::endl;
    std::cout << "STD res size " << res.size() << std::endl;

    std::list<double> resOpt;
    currentX = startX;
    auto startTopt = std::chrono::steady_clock::now();
    while ( currentX <= finishX ) {
        resOpt.push_back( gsl_sf_sin ( currentX ) );
        currentX += stepX;
    }

    auto endTopt = std::chrono::steady_clock::now();

    auto diffTopt = endTopt - startTopt;

    std::cout << "GSL : " << std::chrono::duration <double, std::milli> (diffTopt).count() << " ms" << std::endl;
    std::cout << "GSL res size " << resOpt.size() << std::endl;
    return 0;
}

Here is a result : STD : 57.8869 ms GSL : 106.787 ms

So is it OK for GSL to be slower than libc?

Dmitry
  • 1,912
  • 2
  • 18
  • 29
  • probably a precision difference – Tyker Jun 27 '18 at 11:17
  • I don't know either of these, but I can guess that Scientific Library uses a more accurate method to calculate sine than ``. – Yksisarvinen Jun 27 '18 at 11:17
  • Is there a way to tune speed vs accuracy here? – Dmitry Jun 27 '18 at 11:21
  • 3
    Compiler flags? – Vittorio Romeo Jun 27 '18 at 11:23
  • What exactly can I try to make GSL work faster? Is there a chance to beat libc? – Dmitry Jun 27 '18 at 11:25
  • 3
    A hypothesis: The compiler might know that `sin` has no side-effects. Therefore it may have been able to conclude that removing all calls to `sin` does not change the behaviour of the program, and subsequently removed those calls, making the libc code much faster. Conclusion: A benchmark that ignores results of the calculation is often meaningless. – eerorika Jun 27 '18 at 11:27
  • Tried to add a result of each computation to a list and than print it's size. The result is : STD : 58.404 ms GSL : 105.436 ms – Dmitry Jun 27 '18 at 11:31
  • @Dmitry that doesn't still depend on the result of the computation, and furthermore adds a lot of overhead that is significant compared to what you're measuring. Suggestion: Use one result variable, and add the result of each call to that. In the end, print the result. That's still possible to calculate at compile time though, you could input `numStepsToCalculate` at runtime to prevent that. – eerorika Jun 27 '18 at 11:38
  • Another hypothesis: The compiler probably has internal knowledge about implementation of `sin`. It also probably doesn't have internal knowledge about `gsl_sf_sin`. Therefore the compiler might inline `sin` and unroll the loop, while has to call `gsl_sf_sin` as a function call and keep the loop intact, unless you used link-time optimisation. Did you enable link-time optimisation? – eerorika Jun 27 '18 at 11:39
  • Refreshed my code snipped with result usage. Also I turned off optomization. GSL is linked statically to my exe. I tried g++ -flto -Os - no effect – Dmitry Jun 27 '18 at 11:49

2 Answers2

2

GSL uses software sin (even when a CPU sin is available), and it can give you error estimates too: gsl_sf_sin is just a wrapper currently to gsl_sf_sin_e (which gives error estimates). This routine is not optimized too much for speed.

On the other hand, libc sin is usually optimized pretty well for speed.

Furthermore, your benchmark may be flawed, as sin might be optimized out by the compiler, if optimization is enabled.

geza
  • 28,403
  • 6
  • 61
  • 135
  • With optimization turned off I still have similar results :STD : 138.372 ms GSL : 202.59 ms – Dmitry Jun 27 '18 at 11:37
  • @Dmitry: Of course, when measuring speed, you need to turn optimization on. Your results (57 vs 106ms) suggest that the compiler doesn't optimize away `sin` (but it could, and to be honest, it is strange that it doesn't optimizes it away). – geza Jun 27 '18 at 12:04
0

Unfortunately, I can't comment so I post this as a separate answer. I think that @geza's answer is correct, I would simply like to add a few details. According to GSL source codes, they have their own implementation of trigonometric functions:

/*-*-*-*-*-*-*-*-*-*-*-* Functions with Error Codes *-*-*-*-*-*-*-*-*-*-*-*/

/* I would have prefered just using the library sin() function.
 * But after some experimentation I decided that there was
 * no good way to understand the error; library sin() is just a black box.
 * So we have to roll our own.
 */
int
gsl_sf_sin_e(double x, gsl_sf_result * result)
{
/* CHECK_POINTER(result) */

{
const double P1 = 7.85398125648498535156e-1;
const double P2 = 3.77489470793079817668e-8;
const double P3 = 2.69515142907905952645e-15;

const double sgn_x = GSL_SIGN(x);
const double abs_x = fabs(x);

if(abs_x < GSL_ROOT4_DBL_EPSILON) {
  const double x2 = x*x;
  result->val = x * (1.0 - x2/6.0);
  result->err = fabs(x*x2*x2 / 100.0);
  return GSL_SUCCESS;
}
else {
  double sgn_result = sgn_x;
  double y = floor(abs_x/(0.25*M_PI));
  int octant = y - ldexp(floor(ldexp(y,-3)),3);
  int stat_cs;
  double z;

  if(GSL_IS_ODD(octant)) {
    octant += 1;
    octant &= 07;
    y += 1.0;
  }

  if(octant > 3) {
    octant -= 4;
    sgn_result = -sgn_result;
  }

  z = ((abs_x - y * P1) - y * P2) - y * P3;

  if(octant == 0) {
    gsl_sf_result sin_cs_result;
    const double t = 8.0*fabs(z)/M_PI - 1.0;
    stat_cs = cheb_eval_e(&sin_cs, t, &sin_cs_result);
    result->val = z * (1.0 + z*z * sin_cs_result.val);
  }
  else { /* octant == 2 */
    gsl_sf_result cos_cs_result;
    const double t = 8.0*fabs(z)/M_PI - 1.0;
    stat_cs = cheb_eval_e(&cos_cs, t, &cos_cs_result);
    result->val = 1.0 - 0.5*z*z * (1.0 - z*z * cos_cs_result.val);
  }

  result->val *= sgn_result;

  if(abs_x > 1.0/GSL_DBL_EPSILON) {
    result->err = fabs(result->val);
  }
  else if(abs_x > 100.0/GSL_SQRT_DBL_EPSILON) {
    result->err = 2.0 * abs_x * GSL_DBL_EPSILON * fabs(result->val);
  }
  else if(abs_x > 0.1/GSL_SQRT_DBL_EPSILON) {
    result->err = 2.0 * GSL_SQRT_DBL_EPSILON * fabs(result->val);
  }
  else {
    result->err = 2.0 * GSL_DBL_EPSILON * fabs(result->val);
  }

  return stat_cs;
}
}
}

You should remember that GSL is a library created by scientists and for scientists. Therefore you need a way to control your results and their correctness (see that comment above and this answer for details), speed in this case is secondary.

At the same time you need to remember that system trigonometric functions were implemented by smart IBM guys who know how to make it fast (and absolutely unreadable).

Bracula
  • 335
  • 1
  • 3
  • 14