Detect ascii char in one byte in 32 or 64 bits

Question

When I was looking to code a faster strlen in C (than the one which check byte by byte) I found this macro:

#define DETECTNULL(X) (((X) - 0x01010101) & ~(X) & 0x80808080)

This macro reads 4 bytes and returns (1) when it finds at least one NUL byte. Otherwise it returns (0).

I wonder if it's possible to use the same technique to find any char of the ascii table (I prefer not to use a byte by byte loop).

I tried a lot of combinations and the best I could do is this:

// in this example I wanted to find a '#'

int32_t detectsharp(int32_t c) {
    c = ~(c - 0x24242424) & ~c;
    return ((c - 0x01010101) & ~c & 0x80808080);
}

But it doesn't work with 0x22222222 ("""") or things like 0x24212121 ($!!!).

It however should work, if you xor the int with `0x23232323` and then again try to detect a zero byte. That should detect a sharp. Analogous with all other chars. — Ctx, Feb 23 '19 at 16:17
note that SIMD would be a lot faster than tricks like this. See [Compare 16 byte strings with SSE](https://stackoverflow.com/q/31999284/995714), [Implementing strcmp, strlen, and strstr using SSE 4.2 instructions](https://www.strchr.com/strcmp_and_strlen_using_sse_4.2). You'd better off leave the job for standard libraries, which will use SIMD if possible — phuclv, Mar 13 '19 at 12:53

score 4 · Accepted Answer · answered Feb 23 '19 at 16:26

4

It works to detect any char if you previously xor it with your int.

#define DETECTCHAR(x,c) (DETECTNULL((x) ^ ((c)*0x01010101l) ))

The multiplication distributes the char in the 4 bytes of the int and the xor clears the byte where the char is present.

answered Feb 23 '19 at 16:26

Alain Merigot

10,667
3
18
31

You might want to cast `c` as `(unsigned char)(c)` to avoid problems with negative characters on platforms where `char` is signed by default. Also the `l` suffix is optional and would be more readable as `L`. – chqrlie Feb 23 '19 at 18:56

Detect ascii char in one byte in 32 or 64 bits

1 Answers1