Understanding code in strlen implementation

Question

I have two questions regarding the implementation of strlen in string.h in glibc.

The implementation uses a magic number with 'holes'. I am not able to understand how this works. Can someone please help me understand this snippet:

size_t
strlen (const char *str)
{
   const char *char_ptr;
   const unsigned long int *longword_ptr;
   unsigned long int longword, himagic, lomagic;

   /* Handle the first few characters by reading one character at a time.
      Do this until CHAR_PTR is aligned on a longword boundary.  */
   for (char_ptr = str; ((unsigned long int) char_ptr
             & (sizeof (longword) - 1)) != 0;
        ++char_ptr)
     if (*char_ptr == '\0')
       return char_ptr - str;

   /* All these elucidatory comments refer to 4-byte longwords,
      but the theory applies equally well to 8-byte longwords.  */

   longword_ptr = (unsigned long int *) char_ptr;

   /* Bits 31, 24, 16, and 8 of this number are zero.  Call these bits
      the "holes."  Note that there is a hole just to the left of
      each byte, with an extra at the end:

      bits:  01111110 11111110 11111110 11111111
      bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD

      The 1-bits make sure that carries propagate to the next 0-bit.
      The 0-bits provide holes for carries to fall into.  */

    himagic = 0x80808080L;
       lomagic = 0x01010101L;
       if (sizeof (longword) > 4)
       {
           /* 64-bit version of the magic.  */
           /* Do the shift in two steps to avoid a warning if long has 32 bits.  */
           himagic = ((himagic << 16) << 16) | himagic;
             lomagic = ((lomagic << 16) << 16) | lomagic;
         }
       if (sizeof (longword) > 8)
         abort ();

       /* Instead of the traditional loop which tests each character,
          we will test a longword at a time.  The tricky part is testing
          if *any of the four* bytes in the longword in question are zero.  */
       for (;;)
         {
           longword = *longword_ptr++;

           if (((longword - lomagic) & ~longword & himagic) != 0)
         {
           /* Which of the bytes was the zero?  If none of them were, it was
              a misfire; continue the search.  */

           const char *cp = (const char *) (longword_ptr - 1);

           if (cp[0] == 0)
             return cp - str;
           if (cp[1] == 0)
             return cp - str + 1;
           if (cp[2] == 0)
             return cp - str + 2;
           if (cp[3] == 0)
             return cp - str + 3;
           if (sizeof (longword) > 4)
             {
               if (cp[4] == 0)
             return cp - str + 4;
               if (cp[5] == 0)
             return cp - str + 5;
               if (cp[6] == 0)
             return cp - str + 6;
     if (cp[7] == 0)
      return cp - str + 7;
}}}

What is the magic number being used for?

Why not simply increment pointer until NULL character and return count? Is this approach faster? Why is it so?

On most architectures, glibc will use even faster functions. On modern Intel chips, for example, it uses SIMD extensions to vectorize the check. — rici, Jan 06 '16 at 21:44

Danny_ds · Accepted Answer · 2016-01-07T20:34:26.610

This is used to look at 4 bytes (32 bits) or even 8 (64 bits) in one go, to check if one of them is zero (end of string), instead of checking each byte individually.

Here is one example to check for a null byte:

unsigned int v; // 32-bit word to check if any 8-bit byte in it is 0
bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);

For some more see Bit Twiddling Hacks.

The one used here (32-bit example):

There is yet a faster method — use hasless(v, 1), which is defined below; it works in 4 operations and requires no subsquent verification. It simplifies to

#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)

The subexpression (v - 0x01010101UL), evaluates to a high bit set in any byte whenever the corresponding byte in v is zero or greater than 0x80. The sub-expression ~v & 0x80808080UL evaluates to high bits set in bytes where the byte of v doesn't have its high bit set (so the byte was less than 0x80). Finally, by ANDing these two sub-expressions the result is the high bits set where the bytes in v were zero, since the high bits set due to a value greater than 0x80 in the first sub-expression are masked off by the second.

Looking at one byte at a time costs at least as much cpu cycles as looking at a full interger value (register wide). In this algorithm, full integers are checked to see if they contain a zero. If not, little instructions are used, and a jump can be made to the next full integer. If there is a zero byte inside, a further check is done to see at what exact position it was.

In addition, what the gcc `strlen` implementation does is optimize to take advantage of architectures where 8-byte integers are supported. Above you are limited to looking for a null in 4-bytes at a time. The `if (sizeof (longword) > 4)` comparison in `strlen` extends the comparison for an additional 4-bytes. The benefit either way is improved `strlen` performance for strings longer than approximately 32-chars. (above what you get with a *char-by-char check*). Good answer. — David C. Rankin, Jan 07 '16 at 07:53

Understanding code in strlen implementation

1 Answers1

Linked