71

To my astonishment, this compiles:

const char* c_str()
{
    static const char nullchar = '\0';
    return nullchar;
}

and it introduced a bug in my code. Thankfully, I caught it.

Is this intentional by C++, or a compiler bug? Is there a reason why the data type is actively ignored?
It worked in Visual C++ 2010 and GCC, but I don't understand why it should work, given the obvious data type mismatch. (The static isn't necessary, either.)

Thomas Eding
  • 35,312
  • 13
  • 75
  • 106
user541686
  • 205,094
  • 128
  • 528
  • 886
  • Are you compiling according to C++03? – obataku Aug 19 '12 at 06:42
  • @veer: I believe the GCC link I gave says version 4.3, so it should be no later than C++03? I don't know what VC++ does, though... something between C++03 and C++11 I guess. – user541686 Aug 19 '12 at 06:43
  • Wild stab: some sort of promotion? E.g. You'd expect assigning a `char` to a `long long` would work just fine. Sign issues aside, if the standard allows smaller types to be assigned to larger types then I think you'd expect a `char` to be assignable to a `char *` – ta.speot.is Aug 19 '12 at 06:43
  • @ta.speot.is: I don't think so. Change the data type from `const char` to `const long long` and it still works (in VC++ at least), but `long long` is definitely not 'promoted' to `char*`, right? (32-bit here) – user541686 Aug 19 '12 at 06:44
  • @Mehrdad Maybe if the pointer is 64-bit? I'm not that fluent with the standard, I'm sure someone who knows it will come by soon and give us the answer. – ta.speot.is Aug 19 '12 at 06:45
  • @Mehrdad try specifying a standard to compile according to explicitly. – obataku Aug 19 '12 at 06:45
  • @veer: Works on GCC with `-std=c++98`. – user541686 Aug 19 '12 at 06:47
  • 1
    @ta.speot.is: I don't think it has anything to do with the CPU architecture... – user541686 Aug 19 '12 at 06:47
  • @Mehrdad I suspect `nullchar` is a compile-time constant expression. – obataku Aug 19 '12 at 06:51
  • @veer: Obviously that's how it's being treated :) but I don't have `constexpr` so it's kinda baffling me as to why... C++98 doesn't have the notion of "compile-time constant *expressions*", does it? – user541686 Aug 19 '12 at 06:52
  • 3
    Certainly C++98 does have the notion of compile-time constant expressions. – Managu Aug 19 '12 at 06:55
  • 1
    @Mehrdad well `constexpr` is C++11-specific anyways... but the `const` variable is a constant expression by §5.19.1 of the C++03 standard... __An *integral constant-expression* can involve only literals (2.13), enumerators, `const` variables or static data members of integral or enumeration types initialized with constant expressions (8.5), non-type template parameters of integral or enumeration types, and `sizeof` expressions.__ – obataku Aug 19 '12 at 06:55
  • @Mehrdad similarly §4.10.1 states... __A *null pointer constant* is an integral constant expression (5.19) rvalue of integer type that evaluates to zero. A null pointer constant can be converted to a pointer type; the result is the *null pointer* value of that type and is distinguishable from every other value of pointer to object or pointer to function type. Two null pointer values of the same type shall compare equal.__ – obataku Aug 19 '12 at 06:57
  • 2
    @ta.speot.is "_Maybe if the pointer is 64-bit?_" Size is not an issue. – curiousguy Aug 19 '12 at 07:04

8 Answers8

69

As you've defined it, nullchar is an integer constant expression with the value 0.

The C++03 standard defines an null pointer constant as: "A null pointer constant is an integral constant expression (5.19) rvalue of integer type that evaluates to zero." To make a long story short, your nullchar is a null pointer constant, meaning it can be implicitly converted and assigned to essentially any pointer.

Note that all those elements are required for that implicit conversion to work though. For example, if you had used '\1' instead of '\0', or if you had not specified the const qualifier for nullchar, you wouldn't get the implicit conversion -- your assignment would have failed.

Inclusion of this conversion is intentional but widely known as undesirable. 0 as a null pointer constant was inherited from C. I'm fairly sure Bjarne and most of the rest of the C++ standard committee (and most of the C++ community in general) would dearly love to remove this particular implicit conversion, but doing so would destroy compatibility with a lot of C code (probably close to all of it).

Guilherme Bernal
  • 8,183
  • 25
  • 43
Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • "_doing so would destroy compatibility with a lot of C code (probably close to all of it)._" add: a lot of **C++ code** (probably close to all of it except very recent code) – curiousguy Aug 19 '12 at 07:07
  • +1, great info. I'm kind of curious, though -- why would it destroy C compatibility? I mean, as long as they keep the data type of constant variables correct, then how could it break so much code? (Could you given an example of typical code that would break, perhaps?) Thanks! – user541686 Aug 19 '12 at 07:09
  • 4
    It is probably worth adding that in C a `const` variable with value `0` does not quality as null-pointer constant, which means that the OP's code is not valid in C (and as such would not be "broken" by any changes, since it is broken already from C point of view). – AnT stands with Russia Aug 19 '12 at 07:12
  • @AndreyT: Quite true -- in C, a `const` qualifier isn't enough to product a constant expression. – Jerry Coffin Aug 19 '12 at 07:16
  • @AndreyT by what standard? Compiles fine with C89 (`gcc -std=c89`), though it does produce a warning. – obataku Aug 19 '12 at 07:16
  • @Mehrdad: Quite a bit of C code depends on the fact that a null pointer is equivalent to 0. For example, it's pretty common to see `if (!(ptr=malloc(some_size)))` or `if (!(file=fopen(name, "r")))`. – Jerry Coffin Aug 19 '12 at 07:19
  • 1
    @JerryCoffin: Er, but that only requires the equivalence of a ***literal*** zero to a null pointer, not an arbitrary constant expression. Why ignore the type *after* it's assigned to something that's explicitly typed as something completely unrelated? In other words, what's the point of substituting the entire *expression*, stripping the type, rather than just the *value*, with the correct type? That wouldn't break much sensible code, as far as I can tell... – user541686 Aug 19 '12 at 07:20
  • @veer: By any standard, beginning with the very first one: C89/90. Assigning integer values to pointers has always been illegal in C. As for GCC compiling it... GCC has always been very liberal with erroneous code. GCC becomes a "real" C compiler with `-pedantic-errors` only. – AnT stands with Russia Aug 19 '12 at 07:21
  • @AndreyT oh, I see now :-P I forgot to use `-pedantic-errors`, only `-pedantic` – obataku Aug 19 '12 at 07:22
  • @Mehrdad: There's no literal 0 in either of those `if` statements, only expressions implicitly equivalent to a comparison to the value 0. – Jerry Coffin Aug 19 '12 at 07:23
  • @JerryCoffin: Oh right, I misread your examples. But then that only requires an implicit conversion from pointers to `int` (or `bool`, if that was available), but nothing else... – user541686 Aug 19 '12 at 07:24
  • 1
    @Mehrdad: As-is, they depend on an implicit conversion from one int value (0) to pointer. To make it work in the other direction, you'd have to allow implicit conversion from *any* pointer to int -- which I'm pretty sure would be worse. – Jerry Coffin Aug 19 '12 at 07:26
  • 2
    So maybe like this: Why allow integer constant expressions to be implicitly converted to null pointers (instead of allowing only literals)? (Note that this is a hypothetical question). – Managu Aug 19 '12 at 07:27
  • 1
    @JerryCoffin: I don't get it, which part of `if (!(ptr=malloc(some_size)))` or `if (!(file=fopen(name, "r")))` depends on *"implicit conversion from 0 to pointer"*? – user541686 Aug 19 '12 at 07:27
  • 2
    @Mehrdad: There's no such conversion either in C or C++. In C++ it works through implicit conversion to `bool`. In C it works through implicit comparison to literal `0`, i.e. `if (p)` is equivalent to `if (p != 0)`. The latter also does't use conversion to `int`. – AnT stands with Russia Aug 19 '12 at 07:28
  • 1
    @Managu: C allows it at least partly because it's often an expression not just a literal. To be specific, NULL is often defined as `(void *)0`, so you can't accidentally assign it to an integer type, only a pointer. – Jerry Coffin Aug 19 '12 at 07:28
  • 1
    @Mehrdad: For `T *p`, in C `if (p)` is interpreted as `if (p != 0)`, which in turn means `if (p != (T *) 0)`. This is where it depends on implicit conversion from literal `0` to pointer. – AnT stands with Russia Aug 19 '12 at 07:29
  • 1
    @AndreyT: That doesn't make sense... it would mean `if (p != 0)` is then interpreted as `if ((p != 0) != 0)`, so when does it stop? – user541686 Aug 19 '12 at 07:30
  • @Mehrdad: Why? I don't understand you logic. The definition of `if` statement in C simply states that the argument (as supplied) is compared to `0` and the branching is performed depending on the result of that comparison. There's no provision that would allow infinite recursion. – AnT stands with Russia Aug 19 '12 at 07:34
  • And the second half: why allow const variables to be integer constant expressions? As veer points out, C89 doesn't. But to that, I suspect I know the answer: doing so makes templates more useful. – Managu Aug 19 '12 at 07:34
  • @AndreyT: Ooh interesting... yeah I guess I took your comment too literally. But I didn't know it compares to a *literal* zero, but I thought it would be a the 'zero' of the same data type... wow. Thanks for the info... – user541686 Aug 19 '12 at 07:36
  • @Managu: It also allows things like `const int size = 10; char array[size];` (which predates templates by quite a bit). Oh, and for what it's worth that should be C89, not 98. – Jerry Coffin Aug 19 '12 at 07:36
  • 1
    @Managu: In C language the only ways to produce *named constants* are: 1) macros (i.e `#define`), 2) enums. Macros suffer from well-known serious problems. Enums can't have any other type besides `int`. This problem had to be solved. To solve it C++ extended the notion of *constant* to include `const` objects. Const objects are typed and scoped. – AnT stands with Russia Aug 19 '12 at 07:36
  • 1
    @Mehrdad: If you wish, you can replace `if (p)` with `if (p != 0)` or with `if ((p != 0) != 0)` or with `if (((p != 0) != 0) != 0)` and so on. All these variants are equivalent. But the compiler is not required to follow that path. – AnT stands with Russia Aug 19 '12 at 07:39
28

This is an old history: it goes back to C.

There is no null keyword in C. A null pointer constant in C is either:

  • an integral constant expression with value 0, like 0, 0L, '\0' (remember that char is an integral type), (2-4/2)
  • such expression cast to void*, like (void*)0, (void*)0L, (void*)'\0', (void*)(2-4/2)

The NULL macro (not a keyword!) expands to such null pointer constant.

In the first C++ design, only the integral constant expression was allowed as a null pointer constant. Recently std::nullptr_t was added to C++.

In C++, but not in C, a const variable of integral type initialized with an integral constant expression is an integral constant expression:

const int c = 3;
int i;

switch(i) {
case c: // valid C++
// but invalid C!
}

So a const char initialized with the expression '\0' is a null pointer constant:

int zero() { return 0; }

void foo() {
    const char k0 = '\0',
               k1 = 1,
               c = zero();
    int *pi;

    pi = k0; // OK (constant expression, value 0)
    pi = k1; // error (value 1)
    pi = c; // error (not a constant expression)
}

And you think this is not sound language design?


Updated to include relevant parts of C99 standard... According to §6.6.6...

An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof operator.

Some clarifications for C++-only programmers:

  • C uses the term "constant" for what C++ programmers know as a "literal".
  • In C++, sizeof is always a compile time constant; but C has variable length arrays, so sizeof is sometimes not a compile time constant.

Then, we see §6.3.2.3.3 states...

An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.


To see just how old this functionality is, see the identical mirrored parts in the C99 standard...

§6.6.6

An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof operator.

§6.3.2.3.3

An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

Destructor
  • 14,123
  • 11
  • 61
  • 126
curiousguy
  • 8,038
  • 2
  • 40
  • 58
  • 3
    +1 great info, thanks. Regarding your question: No, I don't think it's sound language design, since it obviously introduced a needless bug in my code. :P – user541686 Aug 19 '12 at 07:11
14

nullchar is a (compile-time-)constant expression, with value 0. So it's fair game for implicit conversion to a null pointer.

In more detail: I'm quoting from a 1996 draft standard here.

char is an integral type. nullchar is const, so it is a (compile-time) integral constant expression, as per section 5.19.1:

5.19 Constant expressions [expr.const]

1 In several places, C++ requires expressions that evaluate to an inte- gral or enumeration constant ... An integral constant-expression can involve ... const variables ...

Moreover, nullchar evaluates to 0, allowing it to be implicitly converted to a pointer, as per section 4.10.1:

4.10 Pointer conversions [conv.ptr]

1 An integral constant expression (expr.const) rvalue of integer type that evaluates to zero (called a null pointer constant) can be con- verted to a pointer type.

Perhaps an intuitive reason "why" this might be allowed (just off the top of my head) is that pointer width isn't specified, and so conversion from any size integral constant expression to a null pointer is allowed.


Updated with the relevant parts of the (newer) C++03 standard... According to §5.19.1...

An integral constant-expression can involve only literals (2.13), enumerators, const variables or static data members of integral or enumeration types initialized with constant expressions (8.5), non-type template parameters of integral or enumeration types, and sizeof expressions.

Then, we look to §4.10.1...

A null pointer constant is an integral constant expression (5.19) rvalue of integer type that evaluates to zero. A null pointer constant can be converted to a pointer type; the result is the null pointer value of that type and is distinguishable from every other value of pointer to object or pointer to function type. Two null pointer values of the same type shall compare equal.

Managu
  • 8,849
  • 2
  • 30
  • 36
  • Well, that's obviously what's *happening*, but (why) is it allowed by C++? That's the question. – user541686 Aug 19 '12 at 06:35
  • In GCC at least, the only time the compiler doesn't generate an error is when the value is 0. if you write `static const char nullchar = '\x30'` or any other value, compilation does fail. So Managu is right: **0 is a special case**. If you want warnings in gcc, use `-Wconversion` on the command line, which warns on all conversions where you don't explicitly use a cast. Not sure about MSVC. – Mr Lister Aug 19 '12 at 06:47
  • 2
    I am very curious as well. I'm wondering if that is really allowed by the C++ standard, or a side effect of constant replacement done too early. – sylvain.joyeux Aug 19 '12 at 06:50
  • @sylvain.joyeux try `g++ -std=c98 -pedantic -W -Wall`... you'll see it's a well-defined part of the standard :-) – obataku Aug 19 '12 at 06:52
  • @veer: Just to mention it: I actually quoted the earlier (draft) standard on purpose, to point out that this functionality isn't new at all. – Managu Aug 19 '12 at 07:05
  • @Managu "_that this functionality isn't new at all._" goes back to the origins! – curiousguy Aug 19 '12 at 07:05
  • If you want to trace origins, see K&R1, page 98: "In general, integers cannot meaningfully be assigned to pointers; zero is a special case." – Jerry Coffin Aug 19 '12 at 07:34
  • @veer: even gcc, at times, does not follow the standard – sylvain.joyeux Aug 19 '12 at 08:06
  • @Managu: thanks for the quotes from the standard ! This is definitely the information I was looking for. – sylvain.joyeux Aug 19 '12 at 08:11
  • @Mehrdad: "... but (why) is it allowed by C++?" Because C++ does lots of things implicitly so save you the pain of typing a few extra characters. This sometimes leads to nasty and hard-to-discover bugs, as you have just experienced. – Giorgio Aug 19 '12 at 18:23
11

It compiles for the very same reason this compiles

const char *p = 0; // OK

const int i = 0;
double *q = i; // OK

const short s = 0;
long *r = s; // OK

Expressions on the right have type int and short, while the object being initialized is a pointer. Does this surprise you?

In C++ language (as well as in C) integral constant expressions (ICEs) with value 0 have special status (although ICEs are defined differently in C and C++). They qualify as null-pointer constants. When they are used in pointer contexts, they are implicitly converted to null pointers of the appropriate type.

Type char is an integral type, not much different from int in this context, so a const char object initialized by 0 is also a null-pointer constant in C++ (but not in C).

BTW, type bool in C++ is also an integral type, which means that a const bool object initialized by false is also a null-pointer constant

const bool b = false;
float *t = b; // OK

A later defect report against C++11 has changed the definition of null-pointer constant. After the correction, null pointer constant can only be "an integer literal with value zero or a prvalue of type std::nullptr_t". The above pointer initializations are no longer well-formed in C++11 after the correction.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
6

It is not ignoring the data type. It's not a bug. It's taking advantage of the const you put in there and seeing that its value is actually an integer 0 (char is an integer type).

Integer 0 is a valid (by definition) null pointer constant, which can be converted to a pointer type (becomes the null pointer).

The reasons why you'd want the null pointer is to have some pointer value which "points to nowhere" and can be checkable (i.e. you can compare a null pointer to an integer 0, and you will get true in return).

If you drop the const, you will get an error. If you put double in there (as with many other non integer types; I guess the exceptions are only types that can be converted to const char* [through overloading of the conversion operators]), you will get an error (even w/o the const). And so forth.

The whole thing is that, in this case, your implementation sees that you're returning a null ptr constant; which you can convert to a pointer type.

5

It seems that a lot of the real answer to this question has ended up in the comments. To summarize:

  • The C++ standard allows const variables of integral type to be considered "integral constant expressions." Why? Quite possibly to bypass the issue that C only allows macros and enums to hold the place of integral constant expression.

  • Going (at least) as far back as C89, an integral constant expression with value 0 is implicitly convertible to (any type of) null pointer. And this is used often in C code, where NULL is quite often #define'd as (void*)0.

  • Going back to K&R, the literal value 0 has been used to represent null pointers. This convention is used all over the place, with such code as:

    if ((ptr=malloc(...)) {...} else {/* error */}
    
Managu
  • 8,849
  • 2
  • 30
  • 36
  • Actually, if I recollect right, that's also for historic reasons; at some point so much code was already referring to (defined by literals) char strings as `char *`, that it was considered reasonable to allow the assignment in question, when `const` was introduced. May I even read this notice in the K&R itself, not sure. – mlvljr Sep 12 '12 at 23:05
2

there is a auto cast. if you well run this program:

#include <stdio.h>
const char* c_str()
{
    static const char nullchar = '\0';
    return nullchar;
}

int main()
{
    printf("%d" , sizeof(c_str()));
    return 0;
}

the out-put well be 4 on my computer -> the size of a pointer.

the compiler auto casts. notice, at least gcc gives a warning (i don't know about VS)

elyashiv
  • 3,623
  • 2
  • 29
  • 52
  • 1
    "_there is a auto cast_" there is no such thing as an "auto cast". Either it is an implicit conversion, or it's a cast (an explicit conversion). – curiousguy Aug 19 '12 at 06:46
2

I think it might be the fact the null character is common between the types. What you are doing is setting a null pointer when you return the null character. This would fail if any other character was used because you are not passing the address of the character to the pointer, but the value of the character. Null is a valid pointer and character value so a null character can be set as pointer.

In short, null can be used by any type to set an empty value, regardless to if it is an array, a pointer, or a variable.

Lost Sorcerer
  • 905
  • 2
  • 13
  • 26