1

The C99 standard requires that "A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string." (5.2.1.2) It then goes on to list 99 other characters that must be in the execution set. Can a character set be used in which the null character is one of these 99 characters? In particular, is it allowed that '0' == '\0' ?

Edit: Everyone is pointing out that in ASCII, '0' is 0x30. This is true, but the standard doesn't mandate the used of ASCII.

joelw
  • 401
  • 3
  • 11

5 Answers5

3

No matter if you use ASCII, EBCDIC or something "self-crafted", '0' must be distinct from '\0', for the reason you mention yourself:

A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string. (5.2.1.2)

If the null character terminates a character string, it cannot be contained in that string. It is the only character which cannot be contained in a string; all other haracters can be used and thus must be distinct from 0.

glglgl
  • 89,107
  • 13
  • 149
  • 217
3

I don't think the standard states that each of the characters that it lists (including the null character) has a distinct value, other than that the digits do. But a "character set" containing a value 0 that allegedly represents 91 of the 100 required characters is clearly not really a character set containing the required 100 characters. So this is either:

  • part of the English-language definition of "a character set",
  • obvious from context,
  • a very minor flaw in the text of the standard, that it should spell it out to prevent wilful misinterpretation by a faithless implementer.

Take your pick.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
2

In the case of the '0'='\0' you will not be able to differ end of string and '0' value.

Thus it will be a bit hard to use something like "0_any_string", as it already starts from '0'.

Alex
  • 9,891
  • 11
  • 53
  • 87
  • 1
    The standard doesn't require that the character set be ASCII. EBCDIC has been used, but any character set that fulfills the requirements of section 5.2.1 is allowed. I am asking if a hypothetical set with `'0'` having a value of `0` is compliant with the standard. – joelw Mar 25 '13 at 08:34
  • The standard doesn't seem to require that strings are able to contain all characters without terminating. So long as a you meet the requirement: "A string is a contiguous sequence of characters terminated by and including the first null character," you have a string, regardless of how well you can use it in practice. – joelw Mar 25 '13 at 08:57
  • Your first answer wasn't an answer to the question at all. – Jens Gustedt Mar 25 '13 at 09:07
  • @JensGustedt Edited. Now it is a bit closer? – Alex Mar 25 '13 at 09:12
1

No, it can't. Character set must be described by an injective function, i.e. a function that maps each character to exactly one distinct binary value. Mapping 2 characters to the same value will make the character set non-deterministic, i.e. the computer won't be able to interpret the data to a matching character since more than one fits.

The C99 standard poses another restriction by forcing the mapping of null character to a specific binary value. Given the above paragraph this means that no other character can have a value identical to null.

SomeWittyUsername
  • 18,025
  • 3
  • 42
  • 85
  • Certainly we would want the map to be injective, and it would be a flaw in the C standard if it failed to require this. But that does not mean the standard does contain such a requirement. Can you cite a part or parts of the C standard that require this? – Eric Postpischil Mar 25 '13 at 11:01
  • @EricPostpischil that's not related to C standard but rather to a definition of character set (any). C standard is a user of character set in this aspect. Note the quote in the question "*A byte with all bits set to 0, called the null character, shall exist in the basic execution character set*" - this means that C language isn't supposed to support any character set that fails to comply. But being injective is a broader requirement and it goes beyond C scope. – SomeWittyUsername Mar 25 '13 at 12:15
-1

The integer constant literal 0 has different meanings depending upon the context in which it's used. In all cases, it is still an integer constant with the value 0, it is just described in different ways.

If a pointer is being compared to the constant literal 0, then this is a check to see if the pointer is a null pointer. This 0 is then referred to as a null pointer constant. The C standard defines that 0 cast to the type void * is both a null pointer and a null pointer constant.

What is the difference between NULL, '\0' and 0

Community
  • 1
  • 1
inquam
  • 12,664
  • 15
  • 61
  • 101