str and str++

Question

I have this code (my strlen function)

size_t slen(const char *str)
{
    size_t len = 0;
    while (*str)
    {
        len++;
        str++;
    }
    return len;
}

Doing while (*str++), as shown below, the program execution time is much larger:

while (*str++)
{
    len++;
}

I'm doing this to probe the code

int main()
{
    double i = 11002110;
    const char str[] = "long string here blablablablablablablabla"
    while (i--)
        slen(str);

    return 0;
}

In first case the execution time is around 6.7 seconds, while in the second (using *str++), the time is around 10 seconds!

Why so much difference?

Why use a double instead of an unsigned long? Also, you should try compiling without optimization and see the results. Oh, and you should run both about twenty times and calculate the average durations. — , May 30 '11 at 20:01
Branch prediction failure? Unnecessary data copies? Try looking at the generated assembly. Also, try turning on the optimization, it may fix the problem. — dmckee --- ex-moderator kitten, May 30 '11 at 20:02
What kind of compiler are you using for this? I run it with my gcc 4.4.5 and they takes almost the same time, around 2s. With i set to 110021100, they both use around 19 secs. — zw324, May 30 '11 at 20:03
So it depends which compiler are using I guess. (Im using codeblocks with gcc compiler) — Ulrira, May 30 '11 at 20:21
To measure time, use provided functions, don't believe on hand watch. On Windows use [QueryPerformanceCounter()](http://stackoverflow.com/a/15720641/995714), and on Linux us [gettimeofday()](http://stackoverflow.com/questions/173409/how-can-i-find-the-execution-time-of-a-section-of-my-program-in-c). Also, use int instead of double — phuclv, May 30 '14 at 01:58
Do you give optimizing options to compiler? (e.g. `-O3` in gcc) If not, compiler can generate slower code to make debugging easy. — ikh, May 31 '14 at 00:05

Blagovest Buyukliev · Answer 1 · 2011-05-30T20:53:01.807

6

Probably because the post-increment operator (used in the condition of the while statement) involves keeping a temporary copy of the variable with its old value.

What while (*str++) really means is:

while (tmp = *str, ++str, tmp)
  ...

By contrast, when you write str++; as a single statement in the body of the while loop, it is in a void context, hence the old value isn't fetched because it's not needed.

To summarise, in the *str++ case you have an assignment, 2 increments, and a jump in each iteration of the loop. In the other case you only have 2 increments and a jump.

edited May 30 '11 at 20:53

answered May 30 '11 at 20:05

Blagovest Buyukliev

42,498
14
94
130

But there is one `test(*str)` and `inc(str)` in both cases -- from a high-level point of view, the work done is the same. – May 30 '11 at 20:06
4

It shouldn't matter with decent compilers. – May 30 '11 at 20:12
2

@pst: In principle, post-increments always involve a making a copy. In practice the copy can often be elided, but depending on the compiler, the exact context of the statement, and the optimization setting it may or may not actually be done. – dmckee --- ex-moderator kitten May 30 '11 at 20:12
@delnan: well, but in non-void contexts such as in the condition of a while statement, the copy cannot be avoided. – Blagovest Buyukliev May 30 '11 at 20:14
@Blagovest Buyukliev - The temporary copy can very much be avoided. It doesn't need to copy the data, merely to test whether the data is zero. It can then perform the increment without making a copy, and then use the results of the test to perform the necessary flow control. – Chris Lutz May 30 '11 at 20:28
@Chris: a good optimising compiler can indeed analyse and optimise the whole loop beyond the condition expression, but in principle my guess is the most logical one. – Blagovest Buyukliev May 30 '11 at 20:47
Both snippets involve post-increment, however. I think I good compiler would "untangle" the post-increment from the condition, and on a machine code level translate snippet 2 into something that looks like snippet 1. – Lundin May 31 '11 at 06:34

score 2 · Answer 2 · answered May 30 '11 at 20:07

2

Trying this out on ideone.com, I get about 0.5s execution with *str++ here. Without, it takes just over a second (here). Using *str++ was faster. Perhaps with optimisation on *str++ can be done more efficiently.

answered May 30 '11 at 20:07

Node

3,443
16
18

score 1 · Answer 3 · answered May 31 '14 at 03:25

Others have already provided some excellent commentary, including analysis for the generated assembly code. I strongly recommend that you read them carefully. As they have pointed out this sort of question can't really be answered without some quantification, so let's and play with it a bit.

First, we're going to need a program. Our plan is this: we will generate strings whose lengths are powers of two, and try all functions in turn. We run through once to prime the cache and then separately time 4096 iterations using the highest-resolution available to us. Once we are done, we will calculate some basic statistics: min, max and the simple-moving average and dump it. We can then do some rudimentary analysis.

In addition to the two algorithms you've already shown, I will show a third option which doesn't involve the use of a counter at all, relying instead on a subtraction, and I'll mix things up by throwing in std::strlen, just to see what happens. It'll be an interesting throwdown.

Through the magic of television our little program is already written, so we compile it with gcc -std=c++11 -O3 speed.c and we get cranking producing some data. I've done two separate graphs, one for strings whose size is from 32 to 8192 bytes and another for strings whose size is from 16384 all the way to 1048576 bytes long. In the following graphs, the Y axis is the time consumed in nanoseconds and the X axis shows the length of the string in bytes.

Without further ado, let's look at performance for "small" strings from 32 to 8192 bytes:

Now this is interesting. Not only is the std::strlen function outperforming everything across the board, it's doing it with gusto too since it's performance is a lot of more stable.

Will the situation change if we look at larger strings, from 16384 all the way to 1048576 bytes long?

Sort of. The difference is becoming even more pronounced. As our custom-written functions huff-and-puff, std::strlen continues to perform admirably.

An interesting observation to make is that you can't necessarily translate number of C++ instructions (or even, number of assembly instructions) to performance, since functions whose bodies consist of fewer instructions sometimes take longer to execute.

An even more interesting -- and important observation is to notice just how well the str::strlen function performs.

So what does all this get us?

First conclusion: don't reinvent the wheel. Use the standard functions available to you. Not only are they already written, but they are very very heavily optimized and will almost certainly outperform anything you can write unless you're Agner Fog.

Second conclusion: unless you have hard data from a profiler that a particular section of code or function is hot-spot in your application, don't bother optimizing code. Programmers are notoriously bad at detecting hot-spots by looking at high level function.

Third conclusion: prefer algorithmic optimizations in order to improve your code's performance. Put your mind to work and let the compiler shuffle bits around.

Your original question was: "why is function slen2 slower than slen1?" I could say that it isn't easy to answer without a lot more information, and even then it might be a lot longer and more involved than you care for. Instead what I'll say is this:

Who cares why? Why are you even bothering with this? Use std::strlen - which is better than anything that you can rig up - and move on to solving more important problems - because I'm sure that this isn't the biggest problem in your application.

score 1 · Answer 4 · answered May 30 '11 at 20:24

This depends on your compiler, compiler flags, and your architecture. With Apple's LLVM gcc 4.2.1, I don't get a noticeable change in performance between the two versions, and there really shouldn't be. A good compiler would turn the *str version into something like

IA-32 (AT&T Syntax):

slen:
        pushl %ebp             # Save old frame pointer
        movl  %esp, %ebp       # Initialize new frame pointer
        movl  -4(%ebp), %ecx   # Load str into %ecx
        xor   %eax, %eax       # Zero out %eax to hold len
loop:
        cmpb  (%ecx), $0       # Compare *str to 0
        je    done             # If *str is NUL, finish
        incl  %eax             # len++
        incl  %ecx             # str++
        j     loop             # Goto next iteration
done:
        popl  %ebp             # Restore old frame pointer
        ret                    # Return

The *str++ version could be compiled exactly the same (since changes to str aren't visible outside slen, when the increment actually occurs isn't important), or the body of the loop could be:

loop:
        incl  %ecx             # str++
        cmpb  -1(%ecx), $0     # Compare *str to 0
        je    done             # If *str is NUL, finish
        incl  %eax             # len++
        j     loop             # Goto next iteration

*str and *str++

4 Answers4

So what does all this get us?

str and str++