0

I'm making a strchr implementation and trying to optimize my code to use the 64bit properties of my machine. So, I' converting my strings to long integers, thus comparing 8 chars at a time.

Currently, I have :

int    has_char(unsigned long word, unsigned long c)
{
    if((word & 0xFF00000000000000) == (c << 56) ) return (1);
    if((word & 0x00FF000000000000) == (c << 48))  return (1);
    if((word & 0x0000FF0000000000) == (c << 40))  return (1);
    if((word & 0x000000FF00000000) == (c << 32))  return (1);
    if((word & 0x00000000FF000000) == (c << 24))  return (1);
    if((word & 0x0000000000FF0000) == (c << 16))  return (1);
    if((word & 0x000000000000FF00) == (c << 8))   return (1);
    if((word & 0x00000000000000FF) == c)          return (1);
    return (0); /* Not found, returning 0 */
}

char    strchr(const char *s, int c)
{
    const char      *curr;
    const long      *str;
    unsigned long    wd;

    str = (long *)s;
    while (1) {
        wd = *str;
        if (has_char(wd, (unsigned long)c)) {
            curr = (char *)str;
                while (*curr) {
                    if (*curr == (char)c)
                        return ((char *)curr);
                    curr++;
                }
        }
        if ((wd - 0x0101010101010101)
            & ~wd & 0x8080808080808080) /* End of string and character not found, exit */
            return (NULL);
    str++;
    }
}

It works well but my has_char is very inefficient, it tests 8 times for the character value. Is there a way to make a unique test (a mask ?) that will return 1 if the character is present in the word and 0 if it is not present ?

Thanks for your help !

achedeuzot
  • 4,164
  • 4
  • 41
  • 56
  • You've already got the per-byte null test at the end of the loop. Simply build a mask from c<<0|c<<8|c<<16..c<<56 and XOR the source word by that, then any matching characters will be zero and you may re-use the null test. – doynax Dec 15 '13 at 21:40
  • Look at the SSE instruction sets – James Dec 15 '13 at 21:43
  • Your code does not work correctly if the string length is not a multiple of 8. And there are platforms where a `long` must be aligned on a 8-byte boundary, so you cannot cast any `char *` to an `unsigned long *`. – Martin R Dec 15 '13 at 22:03
  • Hm, what happens if I get outside the 8-byte boundary ? Segmentation fault ? My computer is not complaining.. well, not yet ? – achedeuzot Dec 15 '13 at 22:06

2 Answers2

3

Very well, here is the precise code as requested:

// Return a non-zero mask if any of the bytes are zero/null, as per your original code
inline uint64_t any_zeroes(uint64_t value) {
    return (value - 0x0101010101010101) & ~value & 0x8080808080808080;
}

char *strchr(const char *s, int ch) {
    // Pre-generate a 64-bit comparison mask with the character at every byte position
    uint64_t mask = (unsigned char) ch * 0x0101010101010101;
    // Access the string 64-bits at a time.
    // Beware of alignment requirements on most platforms.
    const uint64_t *word_ptr = (const uint64_t *) s;

    // Search through the string in 8-byte chunks looking for either any character matches
    // or any null bytes
    uint64_t value;
    do {
        value = *word_ptr++:
        // The exclusive-or value ^ mask will give us 0 in any byte field matching the char
        value = any_zeroes(value) | any_zeroes(value ^ mask);
    } while(!value);

    // Wind-down and locate the final character. This may be done faster by looking at the
    // previously generated zero masks but doing so requires us to know the byte-order
    s = (const char *) --word_ptr;
    do {
        if(*s == (char) ch)
            return (char *) s;
    } while(*s++);
    return NULL;
}

Beware: Written off the top of my head.

doynax
  • 4,285
  • 3
  • 23
  • 19
  • Thanks a lot for the code example and the detailed comments of what is going on ! – achedeuzot Dec 15 '13 at 22:08
  • Note that this implementation may read *beyond* the end-of-string, and [it has been noted this is *technically* UB](http://stackoverflow.com/questions/20312340/why-does-this-implementation-of-strlen-work). It may or may not be "worse" in a 64-bit environment. – Jongware Dec 16 '13 at 11:25
1

First of all, build a new variable c8 which is a c in every position.

unsigned long c8= (c << 56L) | ( c << 48L ) | ... | ( c << 8 ) | c ;

Do this once outside the loop so you're not recalculating.

Then XOR c8 with word, and test each byte for zero. To do this in parallel there are a few choices:

If you want to get ugly, we can start doing some parallel collapsing. First lets fold all ones down into a single spot for each byte:

unsigned long ltmp ;
ltmp= word | (0xf0f0f0f0f0f0f0f0 & ( word >> 4 )) ;
ltmp &= 0x0f0f0f0f0f0f0f0f ;
ltmp |= ( ltmp >> 2 ) ;
ltmp |= ( ltmp >> 1 ) ;
ltmp &= 0x0101010101010101 ;

return ( ltmp != 0x0101010101010101 ) ;

Or, comments are the following test:

((wd - 0x0101010101010101) & ~wd & 0x8080808080808080)

Is equivalent to all the previous operations.

BTW, the form: if (a) return 1 ; return 0 ; can just be written return a ; or return a != 0 ;

woolstar
  • 5,063
  • 20
  • 31
  • So in any case, I will have to test my `word` 8 times, if my char is present in any of the 8 places ? – achedeuzot Dec 15 '13 at 21:43
  • @achedeuzot: No, you've already written a parallel zero test. Use it. – doynax Dec 15 '13 at 21:44
  • A faster way to do that is with `0x0101010101010101 * c` – randomusername Dec 15 '13 at 21:45
  • I'm sorry, but I'm quite new to this, so could someone show me what the code or pseudocode should look like. Or is there is a good bitwise online calculator (I couldn't find any, usually it only gives the result without the operations) that would show me what is happening at every step of the way. – achedeuzot Dec 15 '13 at 21:49
  • The test `if ((wd - 0x0101010101010101 & ~wd & 0x8080808080808080)`, which is already used in OP's code to test for the terminating zero, can also be used here. – Martin R Dec 15 '13 at 21:59
  • I'm not familiar with that particular pattern, but if it works, great. – woolstar Dec 15 '13 at 22:00
  • @MartinR That's approximately what I tried to implement, what I have now in has_char is: `srch = (c << 56) | ( c << 48 ) | ... | ( c << 8 ) | c; res = word ^ srch; if ((res - 0x0101010101010101) & ~res & 0x8080808080808080) {return (1); }else { return (0); }` – achedeuzot Dec 15 '13 at 22:02