What's the fastest way to interpret a bool string into a number in C?

Question

I googled it and all the results were about C++ and C# so I am asking a C specific question.

// Str to lower
if (!memcmp(Str, "true", 4) || !memcmp(Str, "1", 1) || ...) {
     return 1;
} else if (!memcmp(Str, "false", 5) || !memcmp(Str, "0", 1) || ...) {
     return 0;
}
return -1;

That's one way to do it. But I'm not quite sure that's the most efficient way to do it. What's the most efficient way to interpret a bool string (eg. "true") into the equivalent value 1?

If you can (if your inputs are limited) test just the 1st char: `return (*Str == 't') || (*Str == '1');`. This simplification does not work, for example, for `" true"` because of the leading space — pmg, Jul 20 '20 at 21:08
Don't use string for bool. Single byte (`unsigned char`) is enough to store 0 and 1. — i486, Jul 20 '20 at 21:15
@i486 This is for reading user input, eg. commands in a game. Otherwise I NEVER store bools in a string. — , Jul 20 '20 at 21:20
*This is for reading user input,* Why are you worried about saving nanoseconds at the cost of readability and robustness when the process is timed in seconds or even minutes? — Andrew Henle, Jul 20 '20 at 21:53
@user13783520 If it is for user input, why is the need for speed ("fastest way")? There is `strcmpi()` for strings. — i486, Jul 20 '20 at 21:57
So you would seriously deliver harder-to-maintain code that provides zero actual, perceptible, real-world improvement in the product being delivered? But risks bugs in the future? — Andrew Henle, Jul 20 '20 at 21:57
Convert to actual [bool type](https://en.cppreference.com/w/c/types/boolean) at the time of input instead of checking strings all the time. And don't worry about efficiency at input-time, the input operations and waiting for the user will be many order of magnitudes slower than doing the string-to-bool conversion. — Some programmer dude, Jul 21 '20 at 06:52
Are you writing this game? Just ask user to input `T` or `F`, everything else would result in `-1`. Can't beat this, even with the awesome hash solution below! — Vlad Feinstein, Jul 23 '20 at 18:24

score 4 · Answer 1 · answered Jul 20 '20 at 21:13

Since in your example it looks like you are returning -1 for invalid inputs, we can assume they aren't always valid, so you will have to check the entirety of the string no matter what you do.

However, whether a chain of memcmp calls (that start from the beginning but are usually very well optimized) or a decision tree is faster will depend on what the options are, how many there are, the target architecture and hardware, etc.

chux - Reinstate Monica · Accepted Answer · 2020-07-23T14:21:04.920

3

Perhaps a simple hash and test?

#define Ttrue  (((uint_least64_t)'t') << 32 | ((uint_least64_t)'r') << 24 | ((uint_least64_t)'u') << 16 | ((uint_least64_t)'e') << 8 | 0)
#define T1     (((uint_least64_t)'1') << 8 | 0)
#define Tfalse (((uint_least64_t)'f') << 40 | ((uint_least64_t)'a') << 32 | ((uint_least64_t)'l') << 24 | ((uint_least64_t)'s') << 16 | ((uint_least64_t)'e') << 8 | 0)
#define T0     (((uint_least64_t)'0') << 8 | 0)

int Bool_str_decode(const char *Str) {
  uint_least64_t sum = 0;
  do {
    sum <<= 8;
    sum |= *(unsigned char*) Str;
  } while (*Str++ && (sum & 0xFF0000000000) == 0);  // loop to \0 or 6 characters

  if (sum == T1 || sum == Ttrue) return 1;
  if (sum == T0 || sum == Tfalse) return 0;
  return -1;
}

edited Jul 23 '20 at 14:21

answered Jul 20 '20 at 23:02

chux - Reinstate Monica

143,097
13
135
256

How does this solution perform? How much faster is it compared to the code in the question? – wovano Jul 21 '20 at 06:27
@wovano Performance varies per platform and data set. For me, `while` loop about same as `memcmp()` and then the simple `if()` here faster than OP's remaining code. I could see another compiler well optimizing `memcmp(Str, "true", 4)` into only a `long long` compare and mask. Which did you find faster? – chux - Reinstate Monica Jul 21 '20 at 09:19
I didn't test it. But since you (and others as well) answer the question "What's the fastest way...", I would expect some form of prove why the answer is faster (at least on some platform). It's well known that performance is very difficult to guess and often different than expected, so only benchmarking will prove which implementation is the fastest. Personally, I doubt if this is faster than the original code. And you're other answer seems much faster (and nicer) to me. But I'd be interested in the facts. – wovano Jul 21 '20 at 10:36
Is that endian independant? – Jul 22 '20 at 02:46
@user13783520 Yes. endian independent as `sum` and `Tfalse` (and others) are formed the same way: BE. – chux - Reinstate Monica Jul 22 '20 at 02:57
@chux-ReinstateMonica Then why is [this](https://codereview.stackexchange.com/a/245789/227432) apparently endian-dependent? – Jul 22 '20 at 03:06
1

@user13783520, in [that](https://codereview.stackexchange.com/questions/245788/how-can-i-represent-a-given-memory-pattern-in-c-rather-than-a-mathematical-valu) code, you reinterpret 8-bit data as 64-bit data. This is often the case when using pointer casts, but in your case it's done by copying 8 bytes into one 64-bit (=8-byte) word using `memcpy()`. In that case it matters in what order your cpu reads/writes the bytes into the word (i.e. the endianness). In this answer no such conversions are performed, all byte operations are manually coded. – wovano Jul 22 '20 at 06:13
BTW: I think there's a bug in this code. The expression `(sum && 0xFFFFFF00000000)` is incorrect and should be `(sum & 0xFFFF0000000000)` instead. – wovano Jul 23 '20 at 07:40
@wovano Agreed. – chux - Reinstate Monica Jul 23 '20 at 14:23

Zan Lynx · Answer 3 · 2020-07-24T18:01:22.993

1

Try this one. I think it looks pretty good in assembly, especially clang: https://godbolt.org/z/KcYMf8

Update! I HAVE BENCHMARKED IT, along with most everyone else's here.

Results are at https://github.com/zlynx/truth-match-test

#include <stdio.h>

int tobool(const char *s) {
  char lower[16] = {(s[0] | 0x20), (s[1] | 0x20), (s[2] | 0x20),
                    (s[3] | 0x20), (s[4] | 0x20), s[5] | 0x20};
  int match_1 = ((lower[0] == ('1' | 0x20)) & (lower[1] == ('\0' | 0x20)));
  int match_0 = ((lower[0] == ('0' | 0x20)) & (lower[1] == ('\0' | 0x20)));
  int match_true = ((lower[0] == 't') & (lower[1] == 'r') & (lower[2] == 'u') &
                    (lower[3] == 'e') & (lower[4] == ('\0' | 0x20)));
  int match_false =
      ((lower[0] == 'f') & (lower[1] == 'a') & (lower[2] == 'l') &
       (lower[3] == 's') & (lower[4] == 'e') & (lower[5] == ('\0' | 0x20)));

  int is_true = (match_1 | match_true);
  int is_false = (match_0 | match_false);
  return is_true - !(is_true | is_false);
}

const char *outputs[3] = {"invalid", "false", "true"};

int main(int argc, char *argv[]) {
  if (argc < 2)
    return 1;
  int result = tobool(argv[1]);
  puts(outputs[result + 1]);
  return 0;
}

edited Jul 24 '20 at 18:01

answered Jul 20 '20 at 22:24

Zan Lynx

53,022
10
79
131

I expect that using `&&` instead of `&` would be faster. But you'd have to benchmark it to know for sure. – wovano Jul 21 '20 at 06:30
@wovano Actually I expect it would not be faster. Using `&&` introduces a branch, which is why I avoided it. This version of code is branchless and will never have a branch mispredict. – Zan Lynx Jul 21 '20 at 16:18
@wovano Also, using `&` and `|` is what allows clang to vectorize it. – Zan Lynx Jul 21 '20 at 16:19
Yes, I agree about the branches. Although this means it heavily depends on the cpu architecture which is faster (I mostly write embedded software, hence my point of view). I didn't know that clang would vectorize the code, thanks for clarifying that. NB: It would be interesting to see some real benchmark results. – wovano Jul 21 '20 at 17:48
@wovano Oh yeah, heavily depends on the CPU type. Intel and AMD desktop chips can go hundreds of instructions into the future if there's no mispredictions, while an embedded chip probably just executes instructions as they come along and a branch could be much faster if it meant skipping ten instructions or so. – Zan Lynx Jul 21 '20 at 17:55
1

It may be fast, but it doesn't match the requirements (there is no `return -1;` for invalid input) – Vlad Feinstein Jul 24 '20 at 17:44
@VladFeinstein My github link has an updated version. I suppose I could paste that one here too. – Zan Lynx Jul 24 '20 at 17:55

chux - Reinstate Monica · Answer 4 · 2020-07-21T00:57:13.740

fastest way to interpret a bool string into a number in C

How about taking advantage of ASCII and that '0', '1', 'f', 't' can be hashed to [0-3]?

     (hash & 4) ? ((hash >> 4)&3) : hash & 1
'0'  0
'1'  1
'f'  2
't'  3

int bool_str_decode(const char *s) {
  const char *tf[4] = { "0", "1", "false", "true"};
  unsigned hash = *s;
  hash = (hash & 4) ? ((hash >> 4)&3) : hash & 1;
  if (strcmp(tf[hash], s) == 0) return hash & 1;
  return 0;
}

score 1 · Answer 5 · answered Jul 23 '20 at 21:15

Comparison and benchmark results

Since a number of implementations have been posted here without any benchmarks, I took the liberty to compile them all and make a performance comparison.

Interestingly, most of the answers are actually slower than the code from the question (at least at my machine)!

Benchmarking of all implementations is performed in the same way, by executing them 500000000 times in a loop and measuring the CPU time. Tests are performed with all four mentioned valid values ("true", "false", "0" and "1") and an invalid value. Minimum, maximum and average execution time over all executions are determined.

I measured the time of the following implementations:

empty function (reference): just an empty function that always returns -1, used as reference (execution time includes benchmarking overhead)
code from question by user13783520: code from the question, slightly modified to make it work.
1st answer of chux
1st answer of user13783520 (NB: added strlen() because length of string is not known in advance)
answer of Zan Lynx
2nd answer of chux
2nd answer of user13783520 (NB: added strlen() because length of string is not known in advance, but code requires more changes to include boundary checking)
answer of wovano

Note that it's difficult to make a completely fair comparison between the implementations for at least the following reasons:

Two implementations are actually invalid (resulting in Undefined Behavior) because the boundary of the input string is not checked. One implementation resulted in a crash, so that I couldn't measure the time in the same way as I did for all other implementations.
Some implementations do not check for invalid values. They always return 0 or 1, never -1.
Some implementations require the length of the input string to be known in advance. If that's not the case, it should be determined using strlen() (which I added to the code), making the implementation slower off course.
Performance may vary depending on target platform, user input, etc.

Benchmark results

(tests performed on Intel Core i7-6500U, on Ubuntu for Windows, compiled with gcc -O3)

You inspired me. Check this out: https://github.com/zlynx/truth-match-test — Zan Lynx, Jul 24 '20 at 08:01
@ZanLynx, thanks for sharing your results. However, the `wovano_tobool` method is NOT my version, it looks like the version from the question. Also, I've just updated my implementation to match the newly added requirements. Could you update the code on your github to match this version? I'd be happy to see the results :-) — wovano, Jul 24 '20 at 08:08
You're right, your edit confused me. I have fixed it and added your new version too. — Zan Lynx, Jul 24 '20 at 08:21
@ZanLynx, thanks for the update! It's nice to see that my updated implementation is still one of the fastest (and the fastest that handles all specified user inputs). It's too bad that that the currently accepted answer is actually one of the slowest (while only handling a small subset of the possible inputs). NB: I like your solution too and I think it's a really good implementation for the original question. However, now that the function has to deal with 18 possible valid inputs, it doesn't scale well enough I guess. — wovano, Jul 24 '20 at 08:34
Yes the amount of work mine had to do doubled in order to detect invalid input. — Zan Lynx, Jul 24 '20 at 08:36

score 0 · Answer 6 · 2020-07-23T12:52:59.123

My personal solution:

#include <ctype.h>
signed char BoolFromStr(const char *const StrIn, register const unsigned char Len) {
    if (!Len || Len > 5 || !StrIn) {
        return -1;
    }
    switch (tolower(*StrIn)) {
        case '0':
            if (Len == 1) {
                return 0;
            }
            break;
        case 'f':
            if (Len == 1 || (Len == 5 && !memcmp(StrIn+1, (const char[]){'a', 'l', 's', 'e'}, 4))) {
                return 0;
            }
            break;
        case 'n':
            if (Len == 1 || (Len == 2 && StrIn[1] == 'o')) {
                return 0;
            }
            break;
        case '1':
            if (Len == 1) {
                return 1;
            }
            break;
        case 'y':
            if (Len == 1 || (Len == 3 && !memcmp(StrIn+1, (const char[]){'e', 's'}, 2))) {
                return 1;
            }
            break;
        case 't':
            if (Len == 1 || (Len == 4 && !memcmp(StrIn+1, (const char[]){'r', 'u', 'e'}, 3))) {
                return 1;
            }
            break;
    }
    return -1;
}

There's a `break` statement missing in your `case '1'`. – wovano Jul 23 '20 at 08:35 — wovano, Jul 23 '20 at 08:35

score 0 · Answer 7 · 2020-07-27T05:13:21.443

0

I also decided that you can, for short strings such as booleans, convert it into a number by copying the memory and then switching the result:

#include <stdint.h>
signed char BoolFromStrCandidate2(const char *const StrIn, register const unsigned char Len) {
    int64_t Word = 0;
    memcpy(&Word, StrIn, Len);
    switch (Word|32) {
        case '0':
        case 'f':
        case 0x65736c6166:
        case 'n':
        case 0x6f6e:
            return 0;
        case '1':
        case 't':
        case 0x65757274:
        case 'y':
        case 0x736579:
            return 1;
    }
    return -1;
}

edited Jul 27 '20 at 05:13

answered Jul 21 '20 at 02:41

1

`case 0x65736c6166:` relies on a certain endian for `int64_t` – chux - Reinstate Monica Jul 21 '20 at 05:10
@chux-ReinstateMonica https://codereview.stackexchange.com/questions/245788/how-can-i-represent-a-given-memory-pattern-in-c-rather-than-a-mathematical-valu – Jul 21 '20 at 05:11

wovano · Answer 8 · 2020-07-25T17:33:03.610

I want to start by saying that I agree with earlier comments that it's not really useful to optimize this function. We're talking about saving nanoseconds on user interaction that typically takes seconds or more. The processing time is probably less than the time it takes for the "enter" key to be released.

Having said that, here's my implementation. It's a pretty simple implementation, avoiding unnecessary calls to library functions, and giving the compiler enough freedom to optimize the code. On my machine (Intel Core i7-6500U, compiled with gcc -O3) this implementation is faster than all current answers.

int str_to_bool(const char *str)
{
    if ((str[0] & 0xFE) == 48) { // ch == '0' or '1'
        if (str[1] == '\0') {
            return str[0] - 48;
        }
    } else if (str[0] == 't') {
        if (str[1] == 'r' && str[2] == 'u' && str[3] == 'e' && str[4] == '\0') {
            return 1;
        }
    } else if (str[0] == 'f') {
        if (str[1] == 'a' && str[2] == 'l' && str[3] == 's' && str[4] == 'e' && str[5] == '\0') {
            return 0;
        }
    }
    return -1;
}

UPDATATED version

The following versions works with the updated requirements that were not mentioned in the question but in comments. This handles "true", "false", "yes", "no", "t", "f", "y", "n", "1" and "0" and the first letter may be uppercase as well. It's a bit more verbose but still very fast.

int str_to_bool(const char *str)
{
    if ((str[0] & 0xFE) == 48) { // ch == '0' or '1'
        if (str[1] == '\0') {
            return str[0] - 48;
        }
    } else if ((str[0] | 32) == 't') {
        if (str[1] == '\0') {
            return 1;
        }
        if (str[1] == 'r' && str[2] == 'u' && str[3] == 'e' && str[4] == '\0') {
            return 1;
        }
    } else if ((str[0] | 32) == 'f') {
        if (str[1] == '\0') {
            return 0;
        }
        if (str[1] == 'a' && str[2] == 'l' && str[3] == 's' && str[4] == 'e' && str[5] == '\0') {
            return 0;
        }
    } else if ((str[0] | 32) == 'y') {
        if (str[1] == '\0') {
            return 1;
        }
        if (str[1] == 'e' && str[2] == 's' && str[3] == '\0') {
            return 1;
        }
    } else if ((str[0] | 32) == 'n') {
        if (str[1] == '\0') {
            return 0;
        }
        if (str[1] == 'o' && str[2] == '\0') {
            return 0;
        }
    }
    return -1;
}

Q&A (explanation and background info)

Some additional information to answer questions that were asked in comments:

Q: Why is this faster than using memcmp()? I've been told to use library functions when possible.
A: In general it's a good practice to make use of standard library functions such as memcmp(). They're heavily optimized for their intended use and for the targeted platform. For example, on modern cpu architectures memory alignment heavily influences performance, so a memcmp() implementation for such platform will make efforts to read data using the optimal memory alignment. Consequently, the start and end of the memory buffer may need to be handled differently, because they are not guaranteed to be aligned. This causes some overhead, making the implementation slower for small buffers and faster for large buffers. In this case only 1-5 bytes are compared, so using memcmp is not really advantageous. Besides, using the function introduces some calling overhead as well. So, in this case, doing the comparison manually will be much more efficient.

Q: Isn't the use of a switch statement faster than an if-else ladder?
A: It could be, but there's no guarantee. First of all, it depends on the compiler how the switch statement is translated. A common method is to use a jump table. However, this is only feasible if the values used in the case statements are close too each other, otherwise the jump table would be too big to fit in memory. Also note that the jump table implementation is reasonably expensive to execute. My guess is that it's starting to be efficient to use if there at least five cases. Secondly, a good compiler is able to implement a jump table as separate if statements, but it's also able to implement an if-else ladder as a jump table if that would be more efficient. So it really shouldn't matter what you use in C, as long as you make sure that the compiler has enough information and freedom to make such optimizations. (For proof, compile this code for armv7-a using clang 10.0.0 and you'll see that it generates a jump table.)

Q: Isn't it bad to use strcmp() if you already know the length of the string?
A: Well, that depends...

If the length of the string is known in advance, using memcmp() would make more sense indeed, because it probably is slightly faster. However, this is not guaranteed, so you should really benchmark it to know for sure. I can think of a number of reasons why strcmp() could be faster in this case.
If the length of the string is not known, it must be determined (using strlen()) before you can use memcmp(), or access the data otherwise. However, calling strlen() is quite expensive. It could take more time than the above full function to execute.
Note that executing memcmp(Str, "false", 5) is illegal if the buffer is less than 5 bytes. According to the C standard, this results in Undefined Behavior, meaning that the application could crash or give other unexpected results.

Finally, note that my algorithm basically works like a tree. It first checks the first character. If that's a valid character, it will continue with the second character. As soon as a character is found that's not valid, the function returns -1. So it reads every character only once (if the compiler does it's job correctly), in contrast to some of the other implementations that read the input data multiple times.

But does this have the same functionality as the other answers? I'm suppose to return one if `Str` is `"1"`, `"t"`, or `"true"`, and false if `Str` is `"0"`, `"f"`, or `"false"`, and -1 otherwise. And the first character is not case sensitive. — , Jul 23 '20 at 22:39
The problem is, even though your answer is faster, the results aren't really fair because yours only tests for true and false, rather than true, false, t, f, 1, 0, y, yes, n, and no, like mine — , Jul 24 '20 at 00:00
In general, using `strcmp()` multiple times on the same string is bad. Because it's basically `memcmp(Str1, Str2 strlen(Str1))` or something like that. Instead of taking the `strlen` a bazillion times you should take it once, store it in a variable and remember it. Same goes for `memcpy`/`strcpy`. — , Jul 25 '20 at 16:32
@user13783520, not entirely true, `strcmp(str1, str2)` is not the same as `memcmp(str1, str2, strlen(str1))`. Of course you should never execute `strlen(str)` for the same string multiple times. I'm not sure why you would do that. There was a small mistake in my answer (I mentioned `strcmp` where I meant `strlen` in one place), which might have confused you. This is fixed now. I hope it's clear now. — wovano, Jul 25 '20 at 17:36

What's the fastest way to interpret a bool string into a number in C?

8 Answers8

Linked