183

I am using an int type to store a value. By the semantics of the program, the value always varies in a very small range (0 - 36), and int (not a char) is used only because of the CPU efficiency.

It seems like many special arithmetical optimizations can be performed on such a small range of integers. Many function calls on those integers might be optimized into a small set of "magical" operations, and some functions may even be optimized into table look-ups.

So, is it possible to tell the compiler that this int is always in that small range, and is it possible for the compiler to do those optimizations?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
rolevax
  • 1,670
  • 1
  • 14
  • 21
  • 5
    value range optimizations exists in many compilers, eg. [llvm](http://llvm.org/docs/LangRef.html#range-metadata) but I'm not aware of any language hint to declare it. – Remus Rusanu Nov 06 '16 at 08:06
  • 2
    Note that if you have never negative numbers, you might have small gains for using `unsigned` types since they are easier for compiler to reason with. – user694733 Nov 07 '16 at 08:55
  • 5
    @RemusRusanu: Pascal allows you to define [subrange types](https://www.tutorialspoint.com/pascal/pascal_data_types.htm), e.g. `var value: 0..36;`. – Edgar Bonet Nov 07 '16 at 09:52
  • I highly suspect that if you use a enum the compiler will optimize that. – CoffeDeveloper Nov 07 '16 at 12:43
  • 7
    "*int (not a char) is used only because the CPU efficiency.*" This old piece of conventional wisdom is usually not very true. Narrow types sometimes need to be zero- or sign-extended to the full register width, esp. when used as array indices, but sometimes this happens for free. If you have an array of this type, the reduction in cache footprint usually outweighs anything else. – Peter Cordes Nov 07 '16 at 15:16
  • 1
    Forgot to say: `int` and `unsigned int` need to be sign- or zero-extended from 32 to 64-bit, too, on most systems with 64-bit pointers. Note that on x86-64, [operations on 32-bit registers zero-extend to 64-bit for free](http://stackoverflow.com/questions/11177137/why-do-most-x64-instructions-zero-the-upper-part-of-a-32-bit-register) (not sign extend, but signed overflow is undefined behaviour, so the compiler can just use 64-bit signed math if it wants). So you only see extra instructions to zero-extend 32-bit function args, not results of computation. You would for narrower unsigned types. – Peter Cordes Nov 07 '16 at 15:54
  • 1
    So to summarize: a good rule of thumb is to use use narrow types in arrays/structs, but `int` or `unsigned` is usually a good choice for locals inside a function. So `unsigned tmp = array_of_uint8[i];` might give you better asm, if you're going to do some stuff like `tmp *= 2` before using it as an array index or using it in an expression with other wider types. – Peter Cordes Nov 08 '16 at 17:45

4 Answers4

239

Yes, it is possible. For example, for gcc you can use __builtin_unreachable to tell the compiler about impossible conditions, like so:

if (value < 0 || value > 36) __builtin_unreachable();

We can wrap the condition above in a macro:

#define assume(cond) do { if (!(cond)) __builtin_unreachable(); } while (0)

And use it like so:

assume(x >= 0 && x <= 10);

As you can see, gcc performs optimizations based on this information:

#define assume(cond) do { if (!(cond)) __builtin_unreachable(); } while (0)

int func(int x){
    assume(x >=0 && x <= 10);

    if (x > 11){
        return 2;
    }
    else{
        return 17;
    }
}

Produces:

func(int):
    mov     eax, 17
    ret

One downside, however, that if your code ever breaks such assumptions, you get undefined behavior.

It doesn't notify you when this happens, even in debug builds. To debug/test/catch bugs with assumptions more easily, you can use a hybrid assume/assert macro (credits to @David Z), like this one:

#if defined(NDEBUG)
#define assume(cond) do { if (!(cond)) __builtin_unreachable(); } while (0)
#else
#include <cassert>
#define assume(cond) assert(cond)
#endif

In debug builds (with NDEBUG not defined), it works like an ordinary assert, printing error message and abort'ing program, and in release builds it makes use of an assumption, producing optimized code.

Note, however, that it is not a substitute for regular assert - cond remains in release builds, so you should not do something like assume(VeryExpensiveComputation()).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • @Xofo probably. I think this can help too when put as a default case in switches. – asu Nov 06 '16 at 08:17
  • 5
    @Xofo, didn't get it, in my example this already happening, as `return 2` branch eliminated from the code by the compiler. –  Nov 06 '16 at 08:17
  • 6
    However, it seems that gcc [cannot](https://godbolt.org/g/9rZqE2) optimize functions to magical operations or table lookup as OP expected. – jingyu9575 Nov 06 '16 at 10:45
  • what about `__builtin_expect()`? – user3528438 Nov 06 '16 at 13:09
  • 21
    @user3528438, `__builtin_expect` is a non-strict hint. `__builtin_expect(e, c)` should read as "`e` is most likely to evaluate to `c`", and can be useful to optimize branch prediction, but it doesn't restrict `e` to always be `c`, so doesn't allow optimizer to throw away other cases. [Look how branches are organized in assembly](https://godbolt.org/g/Kv5zdM). –  Nov 06 '16 at 13:38
  • 7
    In theory any code that unconditionally causes undefined behaviour could be used in place of `__builtin_unreachable()`. – CodesInChaos Nov 06 '16 at 16:48
  • 5
    @CodesInChaos, in theory - yes, in practice - stick to compiler approved ways of producing undefined behavior. gcc handles various cases of UB [very different](https://godbolt.org/g/DYd96v). –  Nov 06 '16 at 18:23
  • 5
    @CodesInChaos Any code that unconditionally causes undefined behaviour in the abstract machine isn't good enough, as a compiler extension may define the behaviour. For instance, signed integer overflow may, depending on compiler options, be defined as aborting the program, requiring special code to ensure the program really does abort. –  Nov 06 '16 at 18:45
  • Does anyone here know if Boost has smth. like that? (For portability.) Could not find it, but maybe I searched for the wrong stuff. – Baum mit Augen Nov 06 '16 at 18:50
  • 2
    Does a simple `assert` not do this? I would expect `assert` to do everything this answer's `assume` macro does, while also providing more useful output in debug mode. – user2357112 Nov 06 '16 at 20:47
  • 1
    @user2357112, [no](https://godbolt.org/g/Zxib3e). Assert defined in 7.2.1 as `(void)0` (when NDEBUG is defined), or as a "macro..." that "...writes information about the particular call that failed..." "...then calls the abort function" (when NDEBUG is not defined). It doesn't involve undefined behavior, so not useful for optimization. –  Nov 06 '16 at 21:10
  • 1
    @jingyu9575 Looks like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70547 – Marc Glisse Nov 06 '16 at 22:05
  • 2
    Why is the `assume` macro defined into a `do while` loop? – Délisson Junio Nov 07 '16 at 06:40
  • 4
    @wingleader, it is common practice to ensure that semicolon after macro invocation doesn't introduce bugs: http://stackoverflow.com/questions/154136/why-use-apparently-meaningless-do-while-and-if-else-statements-in-c-c-macros –  Nov 07 '16 at 07:02
  • 17
    Unless there's some quirk I don't know about that makes this a bad idea, it might make sense to combine this with `assert`, e.g. define `assume` as `assert` when `NDEBUG` is not defined, and as `__builtin_unreachable()` when `NDEBUG` is defined. That way you get the benefit of the assumption in production code, but in a debug build you still have an explicit check. Of course you then have to do enough tests to assure yourself that the assumption _will_ be satisfied in the wild. – David Z Nov 07 '16 at 07:55
  • 4
    @DavidZ, good point, but don't rush to replace your asserts, though. The biggest advantage of `assert` is that you can `assert(VeryExpensiveCheck())` and still not slow down you release build. Your proposed `assert/assume` thing is strictly better than plain `__builtin_unreachable`, but its usage is still limited to a very specific cases. –  Nov 07 '16 at 09:31
  • 1
    @deniss Oh of course. I had this in mind only as a slightly improved version of `assume` for cases where you've already decided you want to use it, not as a general replacement for `assert` statements. – David Z Nov 07 '16 at 09:53
  • Great answer, I had no idea this existed!! BTW, prefer using non-shortened Godbolt links in answers (as opposed to comments) where the char limit isn't a problem, because [they can't rot](http://meta.stackoverflow.com/a/319594/224132). – Peter Cordes Nov 07 '16 at 15:49
  • 2
    The problem with the release version seems to be that the if checks may stay in the code and cause performance issues? – 2501 Nov 07 '16 at 20:31
  • 1
    @2501 Yes. While the compiler knows the result of the `if`, it still must execute the condition to get any side-effects. For example, say the compiler knows that `f` will return `false` in `if (f()) ...`. It still has to execute enough of `f` because `f` might be `printf("hi"); return false;`. So knowing that something produces a particular result doesn't mean you don't have to execute any of that thing. – David Schwartz Nov 07 '16 at 20:35
  • What I'd like to see would be a compiler-intrinsic "__checked_assume" which a compiler could, at its leisure, *either* process as an assert (trapping if a condition wasn't met, but allowing downstream code to know that it would be met if the trap didn't fire), or else ignore the directive altogether. Especially if there were other directives to grant flexibility as to precisely *when* the traps would fire, the cost of the traps could be far below the savings in downstream code. – supercat Nov 08 '16 at 07:24
  • 3
    This works great with gcc and clang, but ICC actually does the check, and branches to beyond the `ret` instruction of the function! (https://godbolt.org/g/cyX1oU). And then doesn't take advantage of the range information. So I guess the `#ifdef`s around the `assume()` macro had better check for and exclude ICC, not just `__GNUC__ >= whatever`. – Peter Cordes Nov 08 '16 at 18:01
  • @PeterCordes, wow. Thought icc is better than that. Even without `__builtin_unreachable`, icc [can't take advantage](https://godbolt.org/g/tvxfG7) of UB to drop an extra comparison. Interesting that if there is only one `if`, [gcc leaves comparison in place](https://godbolt.org/g/eB8Htm), but clang removes it. There is probably some legacy code which requires this feature... –  Nov 08 '16 at 18:46
  • 1
    icc is very good at auto-vectorizing, but sometimes worse than gcc or clang at other things. I've definitely seen braindead code from icc before for integer bit-fiddling and stuff like that (although mostly from icc13, since Matt Godbolt only recently updated his site with ICC16 and ICC17). But even ICC17 does a bad job with bsr/bsf/tzcnt [in this case](http://stackoverflow.com/questions/40456481/gcc-compiles-leading-zero-count-poorly-unless-haswell-specified) for `__builtin_ctzll()` – Peter Cordes Nov 08 '16 at 19:59
  • 2
    MSVC has [`__assume`](https://msdn.microsoft.com/en-us/library/1b3fsfxw.aspx) that can take the place of the `#define assume(x) ` in this answer. – ratchet freak Nov 09 '16 at 11:43
  • so does clang with [__builtin_assume](http://clang.llvm.org/docs/LanguageExtensions.html#builtin-assume) – ratchet freak Nov 09 '16 at 11:50
67

There is standard support for this. What you should do is to include stdint.h (cstdint) and then use the type uint_fast8_t.

This tells the compiler that you are only using numbers between 0 - 255, but that it is free to use a larger type if that gives faster code. Similarly, the compiler can assume that the variable will never have a value above 255 and then do optimizations accordingly.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 3
    These types are not used nearly as much as they should be (I personally tend to forget that they exist). They give code which is both fast and portable, pretty brilliant. And they've been around since 1999. – Lundin Nov 07 '16 at 12:43
  • This is a good suggestion for the general case. deniss's answer shows a more malleable solution for specific scenarios. – Lightness Races in Orbit Nov 07 '16 at 12:44
  • 3
    The compiler only gets the 0-255 range info on systems where `uint_fast8_t` is actually an 8-bit type (e.g. `unsigned char`) like it is on x86/ARM/MIPS/PPC (https://godbolt.org/g/KNyc31). On [early DEC Alpha before 21164A](http://www.tldp.org/HOWTO/Alpha-HOWTO-8.html), byte loads/stores weren't supported, so any sane implementation would use `typedef uint32_t uint_fast8_t`. AFAIK, there's no mechanism for a type to have extra range-limits with most compilers (like gcc), so I'm pretty certain `uint_fast8_t` would behave exactly the same as `unsigned int` or whatever in that case. – Peter Cordes Nov 07 '16 at 15:02
  • (`bool` is special, and is range-limited to 0 or 1, but it's a built-in type, not defined by header files in terms of `char`, on gcc / clang. Like I said, I don't think most compilers have a mechanism that would make that possible.) – Peter Cordes Nov 07 '16 at 15:03
  • The problem with this exact suggestion is that unsigned types perform modular arithmetic. Overflow has well-defined wrap-around behavior. `uint_fast8_t` even has implementation-defined wrap-around behavior. Practically speaking, even if we somehow know that every `T` value is smaller than 36, a compiler must still consider `23*23==17 (modulo 256)` ! – MSalters Nov 07 '16 at 15:10
  • 1
    Anyway, `uint_fast8_t` is a good recommendation, since it will use an 8-bit type on platforms where that's as efficient as `unsigned int`. (I'm actually not sure what the `fast` types are supposed to be fast *for*, and whether the cache footprint tradeoff is supposed to be part of it.). x86 has extensive support for byte operations, even for doing byte add with a memory source, so you don't even have to do a separate zero-extending load (which is also very cheap). gcc makes `uint_fast16_t` a 64-bit type on x86, which is insane for most uses (vs. 32-bit). https://godbolt.org/g/Rmq5bv. – Peter Cordes Nov 07 '16 at 15:10
  • @LightnessRacesinOrbit: you can use both, since they're orthogonal. – Peter Cordes Nov 07 '16 at 15:12
  • @PeterCordes: Indeed. – Lightness Races in Orbit Nov 07 '16 at 15:30
  • Does any compiler actually do this optimization? (in the cases where `int` or `unsigned int` is chosen as the type) – M.M Nov 08 '16 at 05:57
  • @PeterCordes: What would have been useful from an optimization standpoint would be if there were a category of types that were guaranteed to hold at least the indicated range, but which could arbitrarily be capable of holding larger values at the compiler's leisure. As the Standard is now, on a machine with 32-bit `int`, a store to any 16-bit type, whether signed or unsigned, would need to yield a value within range of that type (for signed types, implementations could define a signal which would be raised if an out-of-range value is converted, but would then have to raise it consistently). – supercat Nov 08 '16 at 07:26
  • @supercat: or did you mean a type that isn't necessarily truncated while it's in registers, only when actually stored to memory. So you have a situation like float/double on x86 without `-ffloat-store`, where rounding/truncation depends on when/where the compiler spills? Yeah that would be interesting. All the cache footprint advantage of `uint16_t`, and little of the extra 16->64 zero extension cost. – Peter Cordes Nov 08 '16 at 14:28
  • @PeterCordes: That's exactly the idea. On the flip side, I'd like to see a family of explicitly-wrapping integer types which would never be implicitly promoted. Operations between a normal integer type and a wrapping type would never be required to yield anything other than that wrapping type. In cases where the wrapping type ranks "int" or above and the other type is higher ranking, compilers would be allowed to promote, but encouraged to instead refuse compilation (the allowance would mean that a 32-bit compiler could define wrap32_t and wrap64_t as synonyms for its... – supercat Nov 08 '16 at 14:52
  • ...corresponding unsigned types without being required to add new semantics for them, but programmers could be assured that code using `wrap32_t` would work the same on 64-bit platforms in cases where it compiled [ideally 32-bit platforms would refuse compilation in cases where 64-bit ones would do so, but allowing compilers to implement the types as synonyms for existing ones would greatly facilitate their adoption]. The basic principle would be that anything that would require promotion would be a constraint violation, though one compilers would sometimes be exempt from diagnosing. – supercat Nov 08 '16 at 14:55
  • @M.M it looks like the compiler doesn't make these assumption about the fast int types. Here's my experiments: https://godbolt.org/z/r73sG9sc6 – Ben Aug 23 '21 at 17:34
11

The current answer is good for the case when you know for sure what the range is, but if you still want correct behavior when the value is out of the expected range, then it won't work.

For that case, I found this technique can work:

if (x == c)  // assume c is a constant
{
    foo(x);
}
else
{
    foo(x);
}

The idea is a code-data tradeoff: you're moving 1 bit of data (whether x == c) into control logic.
This hints to the optimizer that x is in fact a known constant c, encouraging it to inline and optimize the first invocation of foo separately from the rest, possibly quite heavily.

Make sure to actually factor the code into a single subroutine foo, though -- don't duplicate the code.

Example:

For this technique to work you need to be a little lucky -- there are cases where the compiler decides not to evaluate things statically, and they're kind of arbitrary. But when it works, it works well:

#include <math.h>
#include <stdio.h>

unsigned foo(unsigned x)
{
    return x * (x + 1);
}

unsigned bar(unsigned x) { return foo(x + 1) + foo(2 * x); }

int main()
{
    unsigned x;
    scanf("%u", &x);
    unsigned r;
    if (x == 1)
    {
        r = bar(bar(x));
    }
    else if (x == 0)
    {
        r = bar(bar(x));
    }
    else
    {
        r = bar(x + 1);
    }
    printf("%#x\n", r);
}

Just use -O3 and notice the pre-evaluated constants 0x20 and 0x30e in the assembler output.

Community
  • 1
  • 1
user541686
  • 205,094
  • 128
  • 528
  • 886
  • 1
    Wouldn't you want `if (x==c) foo(c) else foo(x)` ? If only to catch `constexpr` implementations of `foo`? – MSalters Nov 10 '16 at 11:37
  • 1
    @MSalters: I knew someone was gonna ask that!! I came up with this technique before `constexpr` was a thing and never bothered to "update" it afterward (though I haven't really ever bothered to worry about `constexpr` even afterward), but the reason I didn't do it initially was that I wanted to make it easier for the compiler to factor them out as common code and remove the branch if it decided to leave them as normal method calls and not optimize. I expected if I put in `c` it's really hard for the compiler to c (sorry, bad joke) that the two are the same code, though I never verified this. – user541686 Nov 10 '16 at 12:12
10

I am just pitching in to say that if you may want a solution that is more standard C++, you can use the [[noreturn]] attribute to write your own unreachable.

So I'll re-purpose deniss' excellent example to demonstrate:

namespace detail {
    [[noreturn]] void unreachable(){}
}

#define assume(cond) do { if (!(cond)) detail::unreachable(); } while (0)

int func(int x){
    assume(x >=0 && x <= 10);

    if (x > 11){
        return 2;
    }
    else{
        return 17;
    }
}

Which as you can see, results in nearly identical code:

detail::unreachable():
        rep ret
func(int):
        movl    $17, %eax
        ret

The downside is of course, that you get a warning that a [[noreturn]] function does, indeed, return.

StoryTeller - Unslander Monica
  • 165,132
  • 21
  • 377
  • 458
  • It works with `clang`, [when my original solution doesn't](https://godbolt.org/g/EkqL4v), so nice trick and +1. But the whole thing is very compiler-dependent (as Peter Cordes showed us, in `icc` it can **worsen** performance), so it is still not universally applicable. Also, minor note: `unreachable` definition [must be available to optimizer and inlined for this to work](https://godbolt.org/g/hlE7hX). –  Nov 11 '16 at 15:07
  • A concise solution, but warnings are generated. – zjyhjqs May 26 '22 at 13:49