what is this loop optimization called and how does it work?

Question

I'm compiling a following code:

#include <stdio.h>
#include <string.h>

int main()
{
  char data[1024];
  scanf("%s", data);

  for (int i = 0; i < strlen(data); i++)
  {
    if (data[i] == 'a')
    {
      printf("%d.\n", i);
    }
  }
}

I'm using -O2 optimization level to GCC. When checking how internal loop is done on assembly level by gdb, I get the following instruction after scanf:

0x40055c:   48 89 e0    mov    %rsp,%rax

and then the code that iterates:

0x40055f <main+47>      mov    (%rax),%ecx 
0x400561 <main+49>      add    $0x4,%rax
0x400565 <main+53>      lea    -0x1010101(%rcx),%edx
0x40056b <main+59>      not    %ecx
0x40056d <main+61>      and    %ecx,%edx
0x40056f <main+63>      and    $0x80808080,%edx
0x400575 <main+69>      je     0x40055f <main+47>

I just wanted to ask, how is this optimization called? so I can read about it instead trying to reverse-engineer how the assembly code works.

P.S. I understand the idea is to move by 4 byte at a time instead one, so it has to do less iterations, but how is it called and how does it work?

score 1 · Accepted Answer · answered Aug 07 '18 at 09:45

1

Just to be clear, that's the strlen, not your loop.

It's an optimizations based on this SWAR word-contains-zero-byte trick, found here among other places:

#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)

Since strlen is an intrinsic function, this is probably not caused by any famous "named optimization", it's a specific trick for a specific function.

answered Aug 07 '18 at 09:45

harold

61,398
6
86
164

Thanks a lot! Exactly what I was searching for. – Zhani Baramidze Aug 07 '18 at 10:14

score 1 · Answer 2 · answered Aug 07 '18 at 10:15

1

The assembly code you show is a part of strlen() and indeed moves 4 bytes at a time, in this case to find the zero byte.

See this for an implementation example of the algorithm.

answered Aug 07 '18 at 10:15

Danny_ds

11,201
1
24
46

what is this loop optimization called and how does it work?

2 Answers2