Why are the standard library functions so fast?

Question

I wrote a simple code to test mystrlen by comparing it to the standard library strlen on a 2089360 byte string. I used two versions of mystrlen and both are very far from the standard strlen runtime, about 93x slower on single execution. what makes the standard strlen so fast?

I compiled all test with

gcc file.c -Ofast

here is my main:

#include <sys/time.h>
#include <string.h>
#include <stdio.h>

int main ()
{
    double time = 0;
    struct timeval start, end;
    const char *s = myBigString;

    for (int j = 0; j < 50; ++j)
    {
        gettimeofday(&start, NULL);

        // int k = mystrlen2(s1);
        int k = strlen(s1);

        gettimeofday(&end, NULL);
        time += end.tv_sec + end.tv_usec / 1e6 - start.tv_sec - start.tv_usec / 1e6; // in seconds
        printf("%d\n", k);
    }

    printf("average time program took %f seconds to execute 50 times\n", time / 50);
    return 0;

}

mystrlen1 code:

size_t mystrlen1(const char *s)
{
    size_t i = 0;
    while (s[i])
        i++;
    return  i;
}

mystrlen2 code:

size_t mystrlen2(const char *s)
{
    size_t i = 0;
    while (s[i])
        i += 2;
    if (!s[i - 1])
        i--;
    return  i;
}

as expected mystrlen2 is better than mystrlen1:

$ ./mystrlen1
average time program took 0.001149 seconds to execute 50 times

$ ./mystrlen2
average time program took 0.000446 seconds to execute 50 times

but now see the output of library strlen:

$ ./libstrlen
average time program took 0.000000 seconds to execute 50 times

what happened here? how it's possible it run in 0 sec?

EDIT:

after couple of tests I found out the problem is the printf instruction that take "lot of" time. If I remove that the runtime will be the same for each function.

time += ((end.tv_sec - start.tv_sec) * 1000000) + (end.tv_usec - start.tv_usec); // in micro seconds

the output is:

$ ./mystrlen1
average time program took 0.020000 microseconds to execute 50 times
$ ./mystrlen2
average time program took 0.020000 microseconds to execute 50 times
$ ./libstrlen
average time program took 0.020000 microseconds to execute 50 times

I think that the standard library makes extensive use of pointers. Because a string or a (const) char* is just an array of chars, they can use pointer arithmetic to iterate over it rather than using indexes like you did. Try `size_t mystrlen1(const char *s) {size_t i = 0; const char* tmp = s; while(*tmp++) {i++;} return i;}` — Gpine185, Nov 09 '22 at 13:53
Post enough of your code that other people can run it, the code you have posted so far is incomplete — asimes, Nov 09 '22 at 13:58
Use sse instructions to compare chunks of memory looking for the first zero byte. (there are specific instructions just for this) This allows cache line size chunks to be checked. Also don't keep a count, just find the last byte and compute the distance. — pmw1234, Nov 09 '22 at 13:58
Where `s1` is taken from? Compiler may optimize `strlen()` to calculate it only once or even in compile time. Compare generated assembly code to understand difference. — dimich, Nov 09 '22 at 14:18
In your `mystrlen1` implementation change `while (s[i])` to `while (*s++)` and measure it again. — Kozmotronik, Nov 09 '22 at 14:18
Take input string from argv[1], not from some local variable. Or you'll just be benchmarking various other optimizations not related to `strlen` at all. — Lundin, Nov 09 '22 at 14:27
the standard library needs to be fast because they're used extensively everywhere, and they're extremely complex to be fast. [Why does glibc's strlen need to be so complicated to run quickly?](https://stackoverflow.com/q/57650895/995714) — phuclv, Nov 09 '22 at 14:32
Re "*as expected mystrlen2 is better than mystrlen1:*", Possibly faster, but not better; It gives incorrect results and invokes undefined behaviour for odd-length strings. — ikegami, Nov 09 '22 at 14:41
Re "*If I remove that the runtime will be the same for each function.*", As my answer shows, `mystrlen1(...)` is replaced with `strlen(...)` by the compiler (at least for me). — ikegami, Nov 09 '22 at 14:42
@ikegami from the demo you posted it looks like so, but not on my m1 Mac. mystrlen1 has its own assembly code and still execute at same time of strlen. — username, Nov 09 '22 at 15:14
@username Try timing again with `const char *s = myBigString;` --> `volatile char *s = myBigString;` — chux - Reinstate Monica, Nov 09 '22 at 16:46

ikegami · Answer 1 · 2022-11-09T14:37:00.767

Check the assembly generated by your program. I think you'll find that your compiler replaced

const char *s = "...";
int k = strlen(s1);

with

int k = 4;  // Or whatever the length is.

You can see this in effect here, where

size_t l0 = strlen( s );

results in

        mov     esi, 4

In fact, we see that the compiler in question realizes that mystring1 is equivalent to strlen, so

size_t l1 = mystrlen1( s );

results in

        mov     edx, 4

It's not worth discussing mystrlen2 since it's incorrect and results in undefined behaviour when provided an odd-length string.

Use a value generated at run time to see how your functions actually perform.

Still, it's entirely possible that you still won't actually be benchmarking your own code if you do this. Again, this other demo using a variable string still shows the compiler in question realizes that mystring1 is equivalent to strlen, and just replaces calls to mystrlen1 with a call to strlen.

Why are the standard library functions so fast?

1 Answers1