2

I am trying to find documentation either confirming or contradicting the statement that

char test[5]="";

results in a buffer initialized to all null characters identical to

memset(test,'\0',sizeof(test));

but have not been able to find (or understand / decipher) anything. I am specifically looking for details in the old specification, C99 reference will work too. Thank you

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
PhilC
  • 291
  • 1
  • 14
  • 2
    Don't know what C95 is, but yes, it is equivalent. The documentation is to follow... – Eugene Sh. Mar 04 '20 at 22:12
  • 2
    Do you mean C99? – Barmar Mar 04 '20 at 22:12
  • Here it is http://port70.net/~nsz/c/c11/n1570.html#6.7.9p21 – Eugene Sh. Mar 04 '20 at 22:13
  • `char test[5] = {0};` should also work. – selbie Mar 04 '20 at 22:14
  • @EugeneSh. Post the relevant quote as an answer. – Barmar Mar 04 '20 at 22:16
  • 1
    @Barmar I am sure it is a duplicate, looking for one – Eugene Sh. Mar 04 '20 at 22:16
  • 1
    Does this answer your question? [C char array initialization](https://stackoverflow.com/questions/18688971/c-char-array-initialization) – Eugene Sh. Mar 04 '20 at 22:17
  • @EugeneSh. your text is from C11 but this question is about C95 – M.M Mar 04 '20 at 22:30
  • 1
    @M.M I am not aware of C95 standard revision, so assuming it is a typo/mistake.... actually found it, it is an *extension* of C90. I don't have a link to a free copy/draft of it. Still not sure the OP is meaning it. FWIW C89 has a very similar wording in http://port70.net/~nsz/c/c89/c89-draft.html#3.5.7 – Eugene Sh. Mar 04 '20 at 22:31
  • I am looking for the (very) old specification, I understand c11 defines it as described. – PhilC Mar 04 '20 at 22:49
  • 1
    @EugeneSh. The original ISO C standard was issued in 1990. An amendment was published in 1995; it added digraphs, `__STDC_VERSION__`, and if I recall correctly ``. Most compilers don't bother to distinguish between C90 and C95. I don't believe there have been any changes in the 1995 amendment or in the 1999, 2011, or any later editions that would affect the answer to the OP's question. On the other hand, C90/C95 may have made some assumptions that later editions made explicitl. (I have copies of the 1990 standard and 1995 amendment which I can check later.) – Keith Thompson Mar 04 '20 at 22:49
  • 1
    I've edited "NULL characters" to "null characters". `NULL` is (a macro that expands to) a null *pointer* constant, and should not (at least in the context of C) be used to refer to the character value. – Keith Thompson Mar 04 '20 at 22:51
  • a c99 reference would do too, thank you – PhilC Mar 04 '20 at 22:52
  • @PhilC C99 is pretty much the same as C11 for this matter: http://port70.net/~nsz/c/c99/n1256.html#6.7.8p21 – Eugene Sh. Mar 04 '20 at 22:53
  • `gcc -std=iso9899:199409` tells gcc to use C94/C95, and sets `__STDC_VERSION__` to `199409L`. BTW, I updated the title to refer to ISO, which has issued all the C standard editions since 1990 (ANSI then adopts the ISO standard). – Keith Thompson Mar 04 '20 at 22:56
  • I wrote: "*I don't believe there have been any changes in the 1995 amendment or in the 1999, 2011, or any later editions that would affect the answer to the OP's question.*". Based on further discussion, it's quite likely I was mistaken, and that the C90 standard is at least less explicit on this point. – Keith Thompson Mar 04 '20 at 23:01

4 Answers4

5

This is not a complete answer to your question but serves to clear up misinformation in other answers.

In ANSI C89 the relevant Standard text was (section 3.5.7):

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.

An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the members of the array.

It only specifies the initialization for array elements corresponding to the string literal. So the trailing array elements are not explicitly initialized and thus have indeterminate value.

There was also a paragraph:

If there are fewer initializers in a list than there are members of an aggregate, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

but that does not apply since we are not initializing from a list ("list" means a brace-enclosed list, not a string literal).


In C90 (which I'm not sure if I can legally link to), the sections were renumbered so that the section containing these paragraphs became 6.5.7. The wording of the latter paragraph was also changed:

If there are fewer initializers in a brace-enclosed list than there are members of an aggregate, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.


In C90 TC1 (HTML, PDF), the above is unaltered.

However, in Defect Report 60, the crucial question was asked:

When an array of char (or wchar_t) is initialized with a string literal that contains fewer characters than the array, are the remaining elements of the array initialized?

Subclause 6.5.7 Initialization, page 72, only says (emphasis mine):

If there are fewer initializers in a brace-enclosed list than there are members of an aggregate, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

Correction

In subclause 6.5.7, page 72, the penultimate paragraph of Semantics (before Examples), add after the comma:

or fewer characters in a string literal or wide string literal used to initialize an array of known size, and elements of character or wchar_t type

It appears that the zero initialisation in the suggested fix must indeed have been the standard-writers intention, since in C90 TC2, we see that same crucial change made:

Page 72

In subclause 6.5.7, page 72, the penultimate paragraph of Semantics (before Examples), add after the comma:

or fewer characters in a string literal or wide string literal used to initialize an array of known size, and elements of character or wchar_t type

giving us:

If there are fewer initializers in a brace-enclosed list than there are members of an aggregate, or fewer characters in a string literal or wide string literal used to initialize an array of known size, and elements of character or wchar_t type the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

Note that TC1 is dated 1994, although it was published in 1995. TC2 is dated 1996. Perplexingly, DR60 is dated July 16 1993, and hence predates TC1. Perhaps work on TC1 was at too advanced a stage by that point to deal with new defect reports and a backlog had accumulated? In any case, TC2 was mostly just a set of corrections in response to defect reports, suggesting that the change was first seen there and not in C95, and that the zero-initialisation of characters after the null terminator was what the C89 standard writers had intended.


In ISO C99 (the original version, without the technical corrigenda), that paragraph was now renumbered to 6.7.8/21 and had changed again. The mentions of "wide string literal" and "elements of character or wchar_t type" were removed:

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

Meaning the trailing array elements are initialized to null bytes .

(Note: The original C99 may be copyrighted material, so I can't post a link to it in the above. That is what it says, though. Here is a link to the last freely available working draft. There were two more drafts after that, but the WG14 website has taken them down. Nevertheless, that wording is in the N843 working draft, and is still present in the later C99 incorporating TC3.)


I have been unable to find any free copies of C95 (ISO/IEC 9899:1990/AMD1:1995). So I cannot answer as to exactly which point between C89 and C99 the "wide string literal" and "wchar_t" changes were made. Also, the subject is not mentioned in the C99 Rationale document.

Of course it is possible that the C99 behaviour was the intent of the C89 authors and the missing text was an oversight, but in the absence of any sort of documentation to that effect we can't draw any conclusion, and there may be compilers from that time that do not initialize the trailing elements.

Hopefully someone else out there who has those documents (or feels inclined to buy them from the ISO store!) can provide an accurate answer.

AJM
  • 1,317
  • 2
  • 15
  • 30
M.M
  • 138,810
  • 21
  • 208
  • 365
  • 1
    I'll check my copies later. It's possible that the authors of the C90 standard implicitly assumed that the array would be zero-filled, and that later editions just made that assumption explicit. – Keith Thompson Mar 04 '20 at 22:53
  • 2
    @KeithThompson OK, that would be great. Surely we can't know their intent unless they documented it though, so we have to take the letter of the text as the specified behaviour (and more importantly, compiler vendors may do the same). I do recall having discussed this issue before in the c.l.c days , maybe even before C99 was published , perhaps due to people observing compilers not initializing the trailing elements. Possibly there is something relevant in those archives – M.M Mar 04 '20 at 22:57
  • 1
    @KeithThompson I made the answer community-wiki so feel free to edit once you have access to your documents – M.M Mar 04 '20 at 23:09
  • Thanks. It seemed easier to post my own answer. – Keith Thompson Mar 08 '20 at 02:41
  • About C95 - I can't find the full version online, but http://www.lysator.liu.se/c/na1.html has what looks like an official summary of it posted, and it doesn't mention array initialization using string literals. – AJM Apr 12 '21 at 11:56
  • Surprisingly, the summary's author (Clive Feather) doesn't host it at his own website, although he has written other content relating to the C language and standard which he hosts there. – AJM Apr 12 '21 at 12:03
  • The text in section 3.5.7 of C89 is in section 6.5.7 of C90, so there's some section renumbering. There is also a change to the key paragraph, which now reads *"If there are fewer initializers in a brace-enclosed list than there are members of an aggregate, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration."* – AJM Apr 12 '21 at 12:39
  • Edited the above into the answer. Also added material from C90 TC2, which contains most of the relevant change. There are still changes between C90 TC2 and C99 TC2 that haven't been sourced, though. – AJM Apr 12 '21 at 14:38
  • @AJM-Reinstate-Monica I think it is a bad idea to link to copyrighted material; as if a DMCA strike is issued then the result will be that this entire thread is permanently removed . Perhaps you could edit to summarize without the link – M.M Apr 12 '21 at 19:50
  • @M.M I think this is the pre-corrigenda C99 you're referring to? I've removed the link. If it's something else, please let me know ASAP. – AJM Apr 13 '21 at 08:53
3

Quick summary

C99 and later guarantees that the remaining characters are initialized to zero. C89/C90/C95 does not make this guarantee and does not specify the values of the remaining characters. This was likely an unintentional oversight, and I speculate that most or all pre-C99 compilers would have zero-initialized the remaining characters anyway. If you're using a conforming C99 or later compiler, zero-initialization is guaranteed.

The gory details

char test[5]="";

Due to a defect in the C89/C90 standard, this was only guaranteed to initialize test[0] to '\0'. The other elements of test were left unspecified.

The C95 amendment did not address this.

The C99 standard corrected the defect, requiring test to be initialized to all zeros.

Another example:

char foo[5] = "foo";

In C89/C90,C95, the language guaranteed foo[0]=='f', foo[1]=='o', foo[2]=='o', foo[3]=='\0', but said nothing about the value of foo[4]. In C99 and later, it's guaranteed to be initialized as if you had written:

char foo[5] = { 'f', 'o', 'o', '\0' };

which, in all editions of the C standard, guarantees foo[4]=='\0'.

Citations

The 1989 ANSI C standard and the 1990 ISO C standard are equivalent, differing only in non-normative introductory material and a renumbering of sections. The 1995 amendment updated the standard but did not affect array initialization.

The 1990 ISO C standard, section 6.5.7, says:

An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

and later in the same section:

If there are fewer initializers in a brace-enclosed list than there are members of an aggregate, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

It specified that trailing members are initialized to zero for a brace-enclosed list, but does not make the same statement for a string literal initializer. (I speculate that this was an unintentional oversight, and that most compilers would have zero-filled the remaining elements anyway since they already had to do so in some cases.)

Each edition of the C standard has a collection of defect reports associated with it:

C90 Defect Report #060, submitted in 1993 by P.J. Plauger and/or Larry Jones, raised this issue:

When an array of char (or wchar_t) is initialized with a string literal that contains fewer characters than the array, are the remaining elements of the array initialized?
Subclause 6.5.7 Initialization, page 72, only says (emphasis mine):

If there are fewer initializers in a brace-enclosed list than there are members of an aggregate, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

The response to this defect report resulted in the revised wording in the C99 standard, section 6.7.8 paragraph 21 (emphasis added):

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
1

From the C Standard (6.7.9 Initialization)

10 If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static or thread storage duration is not initialized explicitly, then:

— if it has pointer type, it is initialized to a null pointer;

— if it has arithmetic type, it is initialized to (positive or unsigned) zero;

...

and

21 If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

This means that in this declaration

char test[5] = "";

all five elements of the array are zero initialized. The first element is initialized explicitly by the terminating zero of the string literal and all other are initialized implicitly the same way as objects with the static storage duration.

At least it is valid starting from the C99 Standard.

Below there is a demonstrative program that shows different ways of initialization of character arrays with zeroes.

#include <stdio.h>

int main(void) 
{
    enum { N = 5 };

    char s1[N] = "";
    char s2[N] = { "" };
    char s3[N] = { 0 };
    char s4[N] = { [0] = 0 };
    char s5[N] = { [N-1] = 0 };

    char * s[] = { s1, s2, s3, s4, s5 };

    for ( size_t i = 0; i < sizeof( s ) / sizeof( *s ); i++ )
    {
        for ( size_t j = 0; j < N; j++ )
        {
            printf( "%d", s[i][j] );
        }
        putchar( '\n' );
    }
    return 0;
}

The program output is

00000
00000
00000
00000
00000
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
0

From a code clarity point of view, if the purpose of the array is to hold strings, then

char test[5] = "";

initializes the array with a zero-length string, and the rest of the bytes shouldn't matter. If they do matter, then the array isn't really a string, and you should use

char test[5] = {0};

to clarify that.

Lee Daniel Crocker
  • 12,927
  • 3
  • 29
  • 55