25

How to calculate the length of a string in C efficiently (in time)?

Right now I'm doing:

int calculate_length(char *string) {
    int length = 0;
    while (string[length] != '\0') {
        length++;
    }
    return length;
}

But it's very slow compared to strlen() for example, is there any other way to do it?

Thanks.

EDIT: I'm working in a freestanding environment, I'm not allowed to use any external lib including "string.h".

Carla Álvarez
  • 1,077
  • 3
  • 10
  • 9
  • 1
    Why not use strlen then? Or is this an exercise? – Sam Post Jan 15 '10 at 09:41
  • 2
    It's not an exercise, the environment where I'm working in doesn't allow me to include other "libs", including "string.h" so I have to implement it and would like it to be as efficient as possible while being maintainble. – Carla Álvarez Jan 15 '10 at 09:46
  • You may want to edit your original post to mention that you're in a freestanding environment. – Matthew Iselin Jan 15 '10 at 09:53
  • 1
    Take into account that the std library can also be compiled with compiler optimizations activated and your code don't. – Khelben Jan 15 '10 at 10:02
  • There are excellent answers here, but keep in mind that this is micro-optimization, and not all programmers understand the use and high importance of macro-optimization. Here's an example of a 40x speedup in perfectly OK-looking code: http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort/927773#927773 – Mike Dunlavey Jan 15 '10 at 14:40

14 Answers14

45

From the FreeBSD source code:

size_t
strlen(const char *str)
{
    const char *s;
    for (s = str; *s; ++s);
    return(s - str);
}

Compared to your code, this probably maps very nicely to an assembler instruction, which can explain a big performance difference.

Andomar
  • 232,371
  • 49
  • 380
  • 404
  • The compiler should be able to optimise this fairly effectively, which means the code is still readable, and should still run fairly quickly. – Matthew Iselin Jan 15 '10 at 09:54
10

strlen(). Odds are, if somebody had found a better, faster generic method, strlen would have been replaced with that.

aib
  • 45,516
  • 10
  • 73
  • 79
9

Take a look at the source code of strlen in the standard libc. Functions in standard libraries are generally highly optimized. Check it out here (coded in assembly) - this is from the GNU libc.

size_t
DEFUN(strlen, (str), CONST char *str)
{
  int cnt;

  asm("cld\n"                   /* Search forward.  */
      /* Some old versions of gas need `repne' instead of `repnz'.  */
      "repnz\n"                 /* Look for a zero byte.  */
      "scasb" /* %0, %1, %3 */ :
      "=c" (cnt) : "D" (str), "0" (-1), "a" (0));

  return -2 - cnt;
}
Sudhanshu
  • 2,691
  • 1
  • 18
  • 25
  • 3
    An assembly version may be faster, but you'd need some numbers to back up that claim. See http://leaf.dragonflybsd.org/mailarchive/commits/2011-11/msg00195.html – eradman Jul 02 '13 at 17:18
6

Take a look at GNU C library's strlen() source.

It uses a number of non-obvious tricks to gain speed without dropping to assembly, including:

  • getting to a character that's properly aligned
  • reading those aligned parts of the string into an int (or some larger datatype) to read several chars at a time
  • using bit twiddling tricks to check if one of the chars embedded in that block of chars is zero

etc.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • 1
    Current FreeBSD one uses something similar, could come handy too: http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/string/strlen.c?rev=1.7.2.1.2.1;content-type=text%2Fplain – Xandy Jan 15 '10 at 10:21
  • What do you mean "without dropping to assembly"? On i386, it does use assembly (see Sudhanshu's reply). – bortzmeyer Jan 15 '10 at 12:51
  • Sudhanshu's is different than the one I linked to. It's certainly possible that when glibc is built for an x86 Sudhanshu's gets used (I'm honestly not sure); however, the example I pointed to is straight C code that can be used as an example of some possible optimizations. – Michael Burr Jan 15 '10 at 16:24
3

The easiest way is to call strlen(). Seriously. It's already optimized by your compiler and/or library vendors, to be as fast as possible for your architecture.

One common optimization is to remove the need to increase a counter, and compute the length from the pointer:

size_t my_strlen(const char *s)
{
  const char *anchor = s;

  while(*s)
   s++;

  return s - anchor;
}
unwind
  • 391,730
  • 64
  • 469
  • 606
3

C strings are intrinsically inefficient, there are two reasons for using the ASCIZ convention:

  • The standard C library uses it
  • The compiler uses it for literal string constants

The first of these is academic in this instance since you are not using the standard library, the second is easily overcome by creating functions or macros that provide conversions from C strings to a more efficient convention such as Pascal strings. The point is you need not be a slave to the C convention if you are not using the C library.

Clifford
  • 88,407
  • 13
  • 85
  • 165
  • ++ You're right, but sometimes we look for cycles in all the wrong places. I haven't seen any real code where the speed of *strlen* was even on the radar, considering the multitude of macro-ways that typically make software slow. – Mike Dunlavey Jan 15 '10 at 15:07
  • 1
    @Mike: Couldn't agree more. Likely to be premature optimisation, but the article I linked gives a couple of real world examples where it has been critical. A strlen() function for a pascal string is both fast and deterministic. – Clifford Jan 15 '10 at 17:03
  • C strings are inefficient for a lot of use cases, but superior to pascal strings for some use cases (e.g. `substring = &string[skipped];`). Tracking string length elsewhere (not prepending it to the string itself) may be more efficient than both pascal strings and C strings. – Brendan Jul 15 '21 at 18:50
2

Yet another way to speed up char counting is to use vectorization!

Here's an example of how to do this with respect to UTF8-encoded strings:

Even faster UTF-8 character counting,

http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html

qbitty
  • 21
  • 1
0

Some of the above answers are very good, and this is my take. There is a keyword known as "register"

#include <stdio.h>
size_t strlenNew(char *s);

int main(int argc, char* argv[])
{
    printf("Size of \"Hello World\" is ::\t%d",strlenNew("Hello World"));
    return 0;
}

size_t strlenNew(char *s)
{
    register int i=0;
    while(s[i]!='\0') i++;
    return i;
}

Read here: http://gustedt.wordpress.com/2010/08/17/a-common-misconsception-the-register-keyword/ and http://msdn.microsoft.com/en-us/library/482s4fy9(v=vs.80).aspx

From the first link:

This can be particularly useful for array variables. An array variable is easily confounded with a pointer variable. Unless it is followed by a [expr] or with a sizeof it evaluates to the address of the first element. If you declare the array register all these uses are forbidden; we only access individual elements or ask for the total size. Such an register-array then may be much easier used as if it just were a set of variable by the optimizer. No aliasing (accessing the same variable through different pointers) may occur.

Thus, sometimes, there may be performance fluctuations. Personally, this is one of my fav implementations, but Sudhanshu and Andomar also provide a good implementation :)

0

On i386 processors, libc often use an ultra-optimized version of strlen, often written in assembly language. The paper "String Length" explains how they work.

Here is one optimized version for OpenBSD. (They also have a portable version.) Here is the version for the GNU libc.

bortzmeyer
  • 34,164
  • 12
  • 67
  • 91
0

I had the same problem, and I resolved it. The key is the 2nd condition of the for loop:

int longitud(char cad[]){

    int i, cont;

    cont = 0;

    for(i = 0; i < 30 && cad[i] != '\0'; i++){
        if(cad[i] != '\0'){
            if(cad[i] != ' '){
                cont++;
            }
        }
    }
    cont--;
    return cont;
}
Alexey Malev
  • 6,408
  • 4
  • 34
  • 52
0

Basic C program to calculate string length.

#include <stdio.h>

/**
* Method to calculate string length.
* Returns -1 in case of null pointer, else return string length.
**/
int length(char *str) {

    int i = -1;
    // Check for NULL pointer, then return i = -1;
    if(str == NULL) return i;

    // Iterate till the empty character.
    while (str[++i] != '\0');
    return i;  // Return string length.
}

int main (int argc, char **argv) {

    int len = 0;
    char abc[] = "hello";
    len = length(abc);
    printf("%d", len);  
    return 0;
}

NOTE: For better way, we should always pass the array size to function to avoid the case of out of bound access. For example the prototype of method should be:

/**
* @desc calculate the length of str.
* @param1 *str pointer to base address of char array.
* @param2 size = capacity of str to hold characters.
* @return int -1 in case of NULL, else return string length.
**/
int length (char *str, int size);
Blanket Fox
  • 377
  • 4
  • 15
Rahul Raina
  • 3,322
  • 25
  • 30
-1

I am not quite sure of what you want to do.

You want to re-write strlen to make your code compatible with standard c-Library, or you want to manage strings.

In the first case, I think you'd better directly use the standard libraries.

The other case is interesting : you should take a look at the c++ string class, that have an implementation of the traits strategy (allowing quick manipulations of very big strings).

Enamul Hassan
  • 5,266
  • 23
  • 39
  • 56
  • The question is pretty precise in what he meant, and states that he can't use the standard includes as he's in a free standing environment. – Dispersia Jul 25 '16 at 21:15
-1

I did not find better :

inline size_t mystrlen(char *_)

  { return ((_ == NULL) ? (_[0] != '\0')) ? 0 : (1 + mystrlen(_ + 1)); }
  • uses recursion which add overhead of calling itself (you cant `inline` function that recursing to unknown depth) – Blanket Fox Feb 25 '22 at 14:41
-5
int max;
max = sizeof(str);
return (--max);
sinelaw
  • 16,205
  • 3
  • 49
  • 80
nel
  • 1