3

When I was looking to code a faster strlen in C (than the one which check byte by byte) I found this macro:

#define DETECTNULL(X) (((X) - 0x01010101) & ~(X) & 0x80808080)

This macro reads 4 bytes and returns (1) when it finds at least one NUL byte. Otherwise it returns (0).

I wonder if it's possible to use the same technique to find any char of the ascii table (I prefer not to use a byte by byte loop).

I tried a lot of combinations and the best I could do is this:

// in this example I wanted to find a '#'

int32_t detectsharp(int32_t c) {
    c = ~(c - 0x24242424) & ~c;
    return ((c - 0x01010101) & ~c & 0x80808080);
}

But it doesn't work with 0x22222222 ("""") or things like 0x24212121 ($!!!).

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Tatchay
  • 33
  • 5
  • 3
    It however should work, if you xor the int with `0x23232323` and then again try to detect a zero byte. That should detect a sharp. Analogous with all other chars. – Ctx Feb 23 '19 at 16:17
  • You should use `uint32_t` instead of `int32_t`. – chqrlie Feb 23 '19 at 18:58
  • note that SIMD would be a lot faster than tricks like this. See [Compare 16 byte strings with SSE](https://stackoverflow.com/q/31999284/995714), [Implementing strcmp, strlen, and strstr using SSE 4.2 instructions](https://www.strchr.com/strcmp_and_strlen_using_sse_4.2). You'd better off leave the job for standard libraries, which will use SIMD if possible – phuclv Mar 13 '19 at 12:53

1 Answers1

4

It works to detect any char if you previously xor it with your int.

#define DETECTCHAR(x,c) (DETECTNULL((x) ^ ((c)*0x01010101l) ))

The multiplication distributes the char in the 4 bytes of the int and the xor clears the byte where the char is present.

Alain Merigot
  • 10,667
  • 3
  • 18
  • 31
  • You might want to cast `c` as `(unsigned char)(c)` to avoid problems with negative characters on platforms where `char` is signed by default. Also the `l` suffix is optional and would be more readable as `L`. – chqrlie Feb 23 '19 at 18:56