6

Inspired by the idea of "C++ HTML template engine that uses compile time HTML parsing", I am trying to write a sample class to check whether the first char in a string is a.

int dummy[0];
class Test
{
public:
    constexpr Test(const char *p):p_(p){}
    constexpr void check()const
    {
        if (p_[0]!='a')
            dummy[1]=0;
    }
    const char *p_;
};


constexpr Test operator"" _test(const char *pszText, size_t)
{
  Test t(pszText);
  t.check();
  return t;
}


int main()
{
    //dummy[1] = 0;
    constexpr Test t = "baa"_test;
}

It works well(GCC7.1 on Ubuntu). If the first char is not a, it will give a compile error:

main.cpp:29:24: error: array subscript value ‘1’ is outside the bounds
of array ‘dummy’ of type ‘int [0]’
constexpr Test t = "baa"_test;

What confused me is that if I change the code to:

int main()
{
    dummy[1] = 0; //no compile error anymore
}

I am wondering when C++ reports an array out of bounds error.

camino
  • 10,085
  • 20
  • 64
  • 115

1 Answers1

3

constexpr requires that the compiler evaluates the code. And, because, in the context of the C++ abstract machine, dummy[1] is undefined behavior, it must emit a diagnostic. (This can be either a warning or an error. However, both g++ and clang++ choose to make this an error.)

However, just because dummy[1] is undefined behavior in the context of the C++ abstract machine, doesn't mean that something else isn't defining the behavior. This means that the code should not necessarily be rejected (if the compiler itself doesn't need to use it). It simply is undefined behavior, not an error.

For instance, compiler writers will use out of bounds array accesses, and because they're writing the compiler, they can ensure that the behavior that occurs is the one desired. A good example is array memory management, where the pointer returned is typically one word past the beginning of the allocated memory (because the size also needs to be allocated.) So when the compiler writer needs to read the size he or she may do ((size_t const*)ptr)[-1].

That being said, a better way to trigger a compiler error in this case is to simply use a static assert: static_assert(p_[0]!='a', "String does not begin with 'a'.").

OmnipotentEntity
  • 16,531
  • 6
  • 62
  • 96
  • 1
    Note that you can do this with perfectly defined behavior; you can allocate a buffer of `char`, ensure that you have sufficient alignment and construct your array with room for a properly aligned `size_t` before it. Then you can take the eventual array pointer, convet back to a char pointer, offset by sizeof size_t, and cast back to the size_t stored there. An optimizer is likely to make the machine code identical do what the naive UB interpretation would be. (The only part I'm uncertain of is new'ing an array with placement new and dynamic size) – Yakk - Adam Nevraumont Aug 03 '17 at 17:37
  • I only mentioned it because I remembered seeing library code that contained something very similar in the past, and it was a good example of what I wanted to demonstrate. But thank you for explaining how it can be done without UB. I was sort of mulling it over, but the intricacies of when casting is or is not UB isn't something I know off of the top of my head. – OmnipotentEntity Aug 03 '17 at 17:44
  • 1
    "cannot be rejected"? Where did that come from? – T.C. Aug 03 '17 at 18:15
  • It turns out that I was wrong about that. I just checked and according to 3.27 a program that contains UB can be rejected from translation with a diagnostic message. I have modified the answer to be more accurate. – OmnipotentEntity Aug 03 '17 at 18:24
  • 1
    Note [UB in a constant expression is ill-formed and requires a diagnostic](https://stackoverflow.com/questions/34276373/self-initialization-of-a-static-constexpr-variable-is-it-well-formed#comment56295043_34276373) which can be an error but could also be a warning. – Shafik Yaghmour Aug 03 '17 at 19:36
  • Thank you, the answer has been updated. Hopefully, it should be completely accurate now. – OmnipotentEntity Aug 03 '17 at 21:09