5

Writing a program that demonstrate different sorting algorithm in C++ on Mac. I found two quicksort implementation, qsort and qsort_b.

The first one is of course the old-fashioned, seen-everywhere qsort. But there's qsort_b, which takes an block rather than a function. Here's my code:

#include <cstdlib>
#include <cstdio>
#include <iostream>
#include <cstdio>
#include <ctime>

#define DATA 1000000

using namespace std;

int compare(const void* a, const void* b)
{
    return *(int*)a - *(int*)b;
}

int main(int argc, char *argv[])
{
    int* array = new int[DATA];

    srand(time(0));

    for ( int i = 0 ; i < DATA ; ++ i )
    {
        array[i] = rand() % 2147483647;
    }

    clock_t begin = clock();

    qsort(array, DATA, sizeof(array[0]), compare);
    //qsort_b(array, DATA, sizeof(array[0]), ^(const void* a, const void* b) { return *(int*)a - *(int*)b; });

    clock_t end = clock();

    cout << "Time it takes to sort " << DATA << " integers with quicksort: " << end - begin;
}

Here I see big speed difference, what's causing all that difference. To my understanding, blocks is for parallel processing, which in this case won't be faster than functions. There's nothing to parallel process, is there?

EDIT: The heapsort_b(), mergesort_b(), and qsort_b() routines are like the corresponding routines without the _b suffix, expect that the compar callback is a block pointer instead of a function pointer. (FROM BSD MAN PAGE)

EDIT: The speed difference. With DATA being 1000000, qsort finished it in 146832 ns, with qsort_b, in 127391 ns. It's a relatively big difference considering it's about 10% faster.

EDIT: I've edited the code to make it possible to have even bigger array of integers. My personal biggest test result are 100000000 integers, 28136278 (28s) vs. 23870078 (24s). It's a considerably big difference to me.

Cahit Gungor
  • 1,477
  • 23
  • 30
Shane Hsu
  • 7,937
  • 6
  • 39
  • 63
  • can you elabourate on "big speed difference" – Karthik T Dec 17 '12 at 07:54
  • @KarthikT I am not sure with the measuring unit, but I think it's nanosecond. With qsort, it's 146832, with qsort_b, it's 127391. With DATA being 1000000. – Shane Hsu Dec 17 '12 at 08:06
  • I've tested it with ever bigger data, 100000000 integers. It's 28136278 (28s) vs. 23870078 (24s). It's a considerably big difference to me. – Shane Hsu Dec 17 '12 at 08:15
  • Seems to be a [non-standard function](http://developer.apple.com/library/mac/#documentation/Darwin/Reference/Manpages/man3/qsort_b.3.html) because that was the only mention I could find of this "qsort_b". – Rapptz Dec 17 '12 at 08:22
  • @Rapptz You are correct, it's non-standard, and it appears on my Mac only. I can find reference of it on BSD Manual, in the link I provided. – Shane Hsu Dec 17 '12 at 08:23
  • Blocks are much like `anonymous functions` - they can be simulated with MACRO's which generates code at compile time. In this example block serves as `inline function` - that is why you get speed increase, because in every row comparison cpu saves time for comparison function calls. – Agnius Vasiliauskas Dec 17 '12 at 08:35
  • @0x69 I don't think blocks can be simulated with macros - blocks can be passed to compiled library functions which don't have access to the source of the caller and thus can't expand the macro. The same goes for a function like `qsort_b` - the call might be faster, but there is no way for true inlining to occur, since `qsort_b` itself has already been compiled. – user4815162342 Oct 20 '13 at 08:18
  • What do you imagine "rand() % 2147483647" does, as opposed to a plain rand() call? Also, the your timings are in units of CLOCK_PER_SEC, so the total times are (end-start)/CLOCKS_PER_SEC seconds. This is platform-dependent, not necessarily nanoseconds. – Klitos Kyriacou Oct 23 '15 at 08:42

2 Answers2

4

Objective-C Block is a different kind of animal. It may seem like MACRO(substitution before compilation), and inline functions(telling compiler "Do your best to eliminate function call overhead"), but the overall structure is much more different than those structures.

Block is an object, furthermore, a stack object. The only object that is allowed to be created in the stack (at least without some tricks).

When a Block object created in the stack, compiler adds all local variables, block variables, the addresses of read/write variables it references, and a pointer to block's executable code. So even before starting to execute, everything is ready for computation without any stack operation.

So, it is not an optimization issue, rather an attribute of the Block. It does not have any function call overhead of PUSH and POP s of stack variables.

As an answer to your question, qsort() and qsort_b() does not have any algorithmic difference, but the elaborated structure, block vs function.

Cahit Gungor
  • 1,477
  • 23
  • 30
2

Looks like optimization difference to me. With qsort_b, the compiler probably inlines the comparison, while with qsort does not. The difference is overhead of function call per comparison.

hyde
  • 60,639
  • 21
  • 115
  • 176
  • You must be correct. Correct me if I'm wrong, blocks are like inline function in this particular situation. And inline functions are considered faster than function. (http://stackoverflow.com/questions/4016061/why-is-inlining-considered-faster-than-a-function-call) – Shane Hsu Dec 17 '12 at 08:30
  • @ShaneHsu I don't know anything about these "blocks" other than what I read just now from http://en.wikipedia.org/wiki/Blocks_(C_language_extension) so can't comment on what kind of code the compiler generates for them. If you want to really understand what's going on, add compiler command line switch to produce asm source files (instead of object files) for both cases and compare them. Another experiment might be to try both versions with optimizations turned off, and then cranked up to the max, and see how it affects relative performance. – hyde Dec 17 '12 at 08:41
  • Will do the experiment later then. Thanks! – Shane Hsu Dec 17 '12 at 23:28
  • If `qsort_b` is a library function that exist on the system in a compiled form, I don't see how it could inline the call to the block. Maybe `qsort_b` simply uses a better sorting algorithm, or blocks have a slightly more efficient calling convention? – user4815162342 Oct 20 '13 at 08:20