2

I've got an assignment for "performance analysis" class and decided to do some testing in gcc and clang using the Man or Boy test. I've completed my assignment already, but something odd came up.

The Objective-C code (no ARC!) with blocks is as follow:

#import <stdlib.h>
#import <assert.h>
#import <Foundation/Foundation.h>

typedef int (^F)(void);

int A(int kParam, F x1, F x2, F x3, F x4, F x5) {
  __block int k = kParam;
  __block F B;

  B = ^ {
    return A(--k, B, x1, x2, x3, x4);
  };
  return k <= 0 ? x4() + x5() : B();
};

F K(int n) {
    return [[^{
                return n;
              } copy] autorelease];
};

int main(int argc, const char **argv) {
  static int TABLE[] = {1, 0, -2, 0, 1, 0, 1, -1, -10, -30, -67, -138, -291,
                        -642, -1446, -3250};

  NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
  if(argc == 2) {
    int k;
    sscanf(argv[1], "%d", &k);
    int result = A(k, K(1), K(-1), K(-1), K(1), K(0));
    assert(result == TABLE[k]);
  };
  [pool drain];

  return EXIT_SUCCESS;
};

And the C++ code I used is as follow:

#include <cassert>
#include <iostream>
#include <tr1/functional> // I'm using tr1 because I have an old version of libstdc++
using namespace std;
using namespace std::tr1;

typedef function<int()> F;

int A(int k, const F &x1, const F &x2, const F &x3, const F &x4, const F &x5) {
  F B = [=, &k, &B] {
    return A(--k, B, x1, x2, x3, x4);
  };
  return k <= 0 ? x4() + x5() : B();
};

F L(int n) {
  return [n] {
    return n;
  };
};

int main(int argc, char **argv) {
  static int TABLE[] = {1, 0, -2, 0, 1, 0, 1, -1, -10, -30, -67, -138, -291,
                        -642, -1446, -3250};

  if(argc == 2) {
    int k;
    sscanf(argv[1], "%d", &k);
    int result = A(k, L(1), L(-1), L(-1), L(1), L(0));
    assert(result == TABLE[k]);
  };

  return EXIT_SUCCESS;
};

The Objective-C version seems to perform quite good (I tested with k ranging from 1 to 15) on both clang 3.5 and llvm-gcc 4.2. The C++ version, on the other hand, took around 9 seconds with k as 15 on both clang 3.5 and gcc 4.9.

Am I missing something? Why is the C++ version so much slower at higher values? (Here's the data table I generated to use in R, in case anyone wants to check.)

Edit:

Looks like the overhead came from std::function<>, as stated in the comments. In case anyone ever needs, using this replacement instead resolved the issue, and it got (a little bit) faster than Objective-C, as one would expect.

Community
  • 1
  • 1
paulotorrens
  • 2,286
  • 20
  • 30
  • 2
    `std::function` is not recommended for performance critical code. Just use plain lambdas or templates. see http://stackoverflow.com/questions/5057382/what-is-the-performance-overhead-of-stdfunction – bolov Jul 07 '15 at 10:41
  • 1
    also `trx` implementations are usually **not** optimized – bolov Jul 07 '15 at 10:42
  • I thought so, and I managed to test it again with an updated libstdc++. Still was slowing down as the number increased. I'm gonna try getting rid of std::function<> to see what happens! – paulotorrens Jul 07 '15 at 10:43
  • C++ and Swift are very sensitive to optimisations turned on or not, Objective-C much less so. Check your optimiser settings. – gnasher729 Jul 07 '15 at 11:07
  • @gnasher729, I tested it with -O0, -Os, -O1, -O2 and -O3. Almost no difference (check table file). – paulotorrens Jul 07 '15 at 11:10
  • @bolov, changing `A` to be a templated function gave some improvement, but I couldn't completly remove std::function<> because of my `B` lambda (which is recursive). Any ideas? – paulotorrens Jul 07 '15 at 11:12
  • `auto B = ` doesn't work? – bolov Jul 07 '15 at 11:34
  • No, but [this](http://codereview.stackexchange.com/questions/14730/impossibly-fast-delegate-in-c11) worked like a charm! :) – paulotorrens Jul 07 '15 at 11:42
  • 1
    @bolov: `std::function` is for erasing the callable type to a common callable type. That's exactly what is needed in this case. – newacct Jul 07 '15 at 19:38

0 Answers0