21

I wrote a function, Str::Compare, that is basically a strcmp rewritten in another way. While comparing the two function, in a loop repeted 500'000'000 times, strcmp execute too much fast, about x750 times faster.

This code was compiled in a C library with -Os parameter active:

int Str::Compare(char* String_1, char* String_2)
{
    char TempChar_1, TempChar_2;

   do
   {
        TempChar_1 = *String_1++;
        TempChar_2 = *String_2++;
   } while(TempChar_1 && TempChar_1 == TempChar_2);

   return TempChar_1 - TempChar_2;
}

The execution time of that function is 3.058s, while strcmp only 0.004s.

Why this happen?

Also this is how I implemented the benchmark loop:

int main()
{
     char Xx[] = {"huehuehuehuehuehuehuehuehuehuehuehuehuehue"},
          Yy[] = {"huehuehuehuehuehuehuehuehuehuehuehuehuehue"};
     for(int i = 0; i < 500000000; ++i)
         Str::Compare(Xx, Yy);
}

Edit: While testing some code I wrote and optimization that improved drastically Str::Compare speed. If before strcmp was x750 times faster now is only x250. This is the new code:

int Str::Compare(char* String_1, char* String_2)
{
     char TempChar_1, TempChar_2, TempChar_3;

     while(TempChar_1 && !TempChar_3)
     {
          TempChar_1 = *String_1++;
          TempChar_2 = *String_2++;
          TempChar_3 = TempChar_1 ^ TempChar_2;
     }

     return TempChar_1 - TempChar_2;
}

New execution time is 0.994s.

John Saunders
  • 160,644
  • 26
  • 247
  • 397
EnryFan
  • 422
  • 3
  • 15
  • why dont u just do `while(*str1++ == *str2++);` – SwiftMango Dec 25 '13 at 16:42
  • Because that code is the most inefficient implementation of this function. – EnryFan Dec 25 '13 at 16:55
  • `1` the edited version of `Str::Compare` is not good - `TempChar_3` is used before being initialized, `2` Code generated by VS2010 for original function is almost as fast as `strcmp`. `3` it's hard to beat massively optimized function (even intrinsic!) with a straightforward implementation. – Roman R. Dec 25 '13 at 17:28
  • explain why? that code is similar to the standard imp of strcpy afaik – SwiftMango Dec 25 '13 at 17:42
  • 1
    `while(*str_1 && *str_1++ == *str_2++);` Actually is the correct form. That is more slow for the simple reason that the processor must resolve the address of the pointer every time, with a performace-loss. – EnryFan Dec 25 '13 at 17:54

3 Answers3

27

I was curious about it and build a test program:

#include <string.h>

compare(char* String_1, char* String_2)
{
    char TempChar_1,
         TempChar_2;

   do
   {
        TempChar_1 = *String_1++;
        TempChar_2 = *String_2++;
   } while(TempChar_1 && TempChar_1 == TempChar_2);

   return TempChar_1 - TempChar_2;
}


int main(){
    int i=strcmp("foo","bar");
    int j=compare("foo","bar");

    return i;
}

I compiled it to assembler with gcc -S -Os test.cusing gcc 4.7.3 resulting in the following assembler:

    .file   "test.c"
    .text
    .globl  compare
    .type   compare, @function
compare:
.LFB24:
    .cfi_startproc
    xorl    %edx, %edx
.L2:
    movsbl  (%rdi,%rdx), %eax
    movsbl  (%rsi,%rdx), %ecx
    incq    %rdx
    cmpb    %cl, %al
    jne .L4
    testb   %al, %al
    jne .L2
.L4:
    subl    %ecx, %eax
    ret
    .cfi_endproc
.LFE24:
    .size   compare, .-compare
    .section    .rodata.str1.1,"aMS",@progbits,1
.LC0:
    .string "bar"
.LC1:
    .string "foo"
    .section    .text.startup,"ax",@progbits
    .globl  main
    .type   main, @function
main:
.LFB25:
    .cfi_startproc
    movl    $.LC0, %esi
    movl    $.LC1, %edi
    call    compare
    movl    $1, %eax
    ret
    .cfi_endproc
.LFE25:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
    .section    .note.GNU-stack,"",@progbits

I am not that good in x86 assembler but as far as I see it the call to the strcmp is removed and simply replaced by a constant expression ( movl $1, %eax ). So if you use a constant expression for your tests, gcc probably optimizes the strcmp to a constant.

lwi
  • 1,682
  • 12
  • 21
  • 1
    Gcc can also replace calls to strlen with constants – James Dec 22 '13 at 23:37
  • 4
    [Here's G++ 4.8.1 reducing everything to a constant at Coliru.](http://coliru.stacked-crooked.com/a/8bec7ba91e5fce00) – Casey Dec 23 '13 at 00:30
  • `strcmp` is replaced with a constant expression even in mine program, where I didn't use constant. I'll work for understend better why of this. – EnryFan Dec 23 '13 at 08:32
5

When comparing performance, I've found that it's best to put the test functions and the test driver in separate compilation units. Put your test functions in separate compilation units, and compile those to whatever optimization level you want, but compile the test driver unoptimized. Otherwise you will run into exactly the kind of problem you've seen here.

The problem is that strcmp compares two const C-style strings. If you loop 500,000,000 times over strcmp(string_a, string_b), an optimizing compiler is going to be smart enough to reduce that loop to optimize that loop away, and then perhaps smart enough to optimize away the one remaining call to strcmp.

Your compare function takes two non-const strings. As far as the compiler is concerned, your function may well have side effects. The compiler doesn't know, so it can't optimize the loop down to nothing. It has to generate code to perform the comparison 500,000,000 times.

David Hammen
  • 32,454
  • 9
  • 60
  • 108
  • `const` qualification of pointer parameters doesn't usually affect a compiler's ability to infer about side effects. Casting away the constness is fine as long as the original array objects are non-const. Also, the function could call something else causing modification or aliasing to the string. – Potatoswatter Dec 23 '13 at 07:00
-3

I believe most of the standard libraries are written in assembly language. That could be the reason you see the standard library is faster than yours.

  • This may be true, but it won't account for a 750x speed difference. That's a bigger difference than the difference between interpreted languages and compiled languages. Usually when I see discrepancies that big, I suspect things like cache misses and possibly even branch prediction failure. – Beefster Jul 13 '20 at 05:15