16

Similar questions have been asked about the data type of string literals in C++.

Many people have cited the standard:

A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration (3.7)

I've written the following statement in the main function:

  char cstring[]= "hellohellohellohellohellohello";

But I can't find any string literal stored as static data in the assembly. In fact, the assembly shows that the string is decomposed and "stored" directly in the instructions.

    movl    $1819043176, -48(%rbp)
    movl    $1818585199, -44(%rbp)
    movl    $1701343084, -40(%rbp)
    movl    $1752132716, -36(%rbp)
    movl    $1869376613, -32(%rbp)
    movl    $1819043176, -28(%rbp)
    movl    $1818585199, -24(%rbp)
    movw    $28524, -20(%rbp)
    movb    $0, -18(%rbp)

While a similar statement in the global scope has as a result the string stored as static data.

char cstring1[] = "hellohellohellohellohellohello";

The assembly

cstring1:
    .string "hellohellohellohellohellohello"

The above example is available online here.

So this seems not conform to the cited standard. Maybe there are some exceptions to what is cited here?

Gab是好人
  • 1,976
  • 1
  • 25
  • 39

3 Answers3

20

It does conform to the standard, under the "as-if" rule.

Since the only thing that the string literal is ever used for is to initialize cstring, there is no need for any object representation for it. The compiler has eliminated it in favour of initializing cstring by an alternative means that has equivalent results, but that the compiler decides is better in some respect (speed or code size).

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • 1
    While this is true, I think it's also important to note that the language does define real differences between the cases mentioned in the question. – Ben Voigt Jul 25 '16 at 15:00
10

Expressions have type. String literals have type if they are used as an expression. Yours isn't.

Consider the following code:

#include <stdio.h>

#define STR "HelloHelloHello"

char global[] = STR;

int main(void)
{
    char local[] = STR;
    puts(STR);
}

There are three string literals in this program formed using the same tokens, but they are not treated the same.

The first, the initializer for global, is part of static initialization of an object with static lifetime. By section 3.6.2, static initialization doesn't have to take place at runtime; the compiler can arrange for the result to be pre-formatted in the binary image so that the process starts execution with the data already in place, and it has done so here. It would also be legal to initialize this object in the same fashion as local[], as long as it was performed before the beginning of dynamic initialization of globals.

The second, the initializer for local, is a string literal, but it isn't really an expression. It is handled under the special rules of 8.5.2, which states that the characters within the string literal are independently used to initialize the array elements; the string literal is not used as a unit. This object has dynamic initialization, resulting in loading the value at runtime.

The third, an argument to the puts() call, actually does use the string literal as an expression, and it will have type const char[N], which decays to const char* for the call. If you really want to study object code used to handle the runtime type of a string literal, you should be using the literal in an expression, like this function call does.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • Are you saying that string literals don't have a type if they're not used in an expression? I believe they do. That type may or may not have any relevance to the generated code. In fact, types generally don't exist in generated code. – Keith Thompson Jul 25 '16 at 16:38
  • @Keith: *Static type* is a property of an expression (see 1.3), so things that aren't expressions don't have it. In any case, the unit (string literal expression having type blah) is not used in these initializations, as the language calls for access to the individual characters and never the string. – Ben Voigt Jul 25 '16 at 16:50
  • Yes. How is a string literal not an expression? – Keith Thompson Jul 25 '16 at 16:51
  • @KeithThompson: The expression which might exist, consisting of that string literal, is not used here. Only the individual characters within the string literal are used. Direct quote: "Successive characters of the value of the string literal initialize the elements of the array." And the remainder of the section talks about *multiple initializers*. Not one initializer expression that is a string literal having `const char[N]` type. – Ben Voigt Jul 25 '16 at 16:54
  • @Keith: it's a matter of the grammar of the language. Either the thing on the RHS of this particular initialization is an expression, or it isn't, and instead matches some other grammar production specified in the standard to handle this case. And for example with aggregate initialization I believe it isn't, so what Ben's saying makes sense assuming only that it's true, which I'm too lazy to check but I don't have any reason to doubt. – Steve Jessop Jul 25 '16 at 17:10
  • 2
    @SteveJessop: Yes, it is. Look at the grammar of an *initializer* in section 8.5 of the C++11 standard. *initializer* -> *brace-or-equal-initializer* -> *`=` initializer-clause*, *initializer-clause* -> *assignment-expression*. In `char cstring1[] = "hellohellohellohellohellohello";`, the string literal is an *assignment-expression*. I'm not aware of any context in which a string literal can appear in which it's not an expression of some kind -- regardless of how it's used. I can write `42;` as a statement, and the expression `42` is not used -- but it's still an expression with a type. – Keith Thompson Jul 25 '16 at 18:01
  • @KeithThompson: The grammar describes a DFT that accepts or rejects the token sequence; it does not define semantics. 8.5.2 gives this case different semantics, although the RHS meets the grammatic requirement to be treated as an expression, it is not one. Just as `f(a, b)` doesn't prevent you from talking about the parenthesized comma-operator expression `(a, b)` -- but it would be useless to do so, because the semantics here are a two-argument function call, not a comma-operator. The grammar would be much larger and more complex if all the semantic cases were enumerated individually. – Ben Voigt Jul 25 '16 at 18:04
  • The `(a, b)` in `f(a, b)` isn't a parenthesized comma-operator expression because of the grammatical context in which it appears. The grammar in 8.5 says that the string literal in the example is an *assignment-expression*. The description in 8.5.2 doesn't change that, it merely defines the semantics of that particular expression in that particular context. – Keith Thompson Jul 25 '16 at 18:08
  • @KeithThompson: Except that the string literal CAN'T be an initializer, singular, in the form of an assignment-expression (which is what matches the grammar rules), when 8.5.2 talks about the cardinality of initializers, plural (in reference to this exact case, no begging out with "one is a number too"). The grammar could have been *initializer-clause* -> *assignment-expression* | *string-literal* but the BNF for *assignment-expression* already makes *string-literal* legal in that position. In C++, grammar does not define semantics. – Ben Voigt Jul 25 '16 at 18:12
  • 2
    Apparently I'm missing something here. As you say, the grammar *initializer-clause* -> *assignment-expression* already permits a *string-literal* in that location. So the *string-literal*, in that context, is an *assignment-expression*. The value and type of that expression may or may not be relevant, depending on how it's used -- but it's still an expression, and it still has type `const char[N]`. Why would a string literal not have a type because of its context? In the statement `42;`, is the constant `42` not of type `int`? – Keith Thompson Jul 25 '16 at 18:48
  • @KeithThompson: The grammar is just a quick reference to determine whether something is legal there. It is. Then you have to check the semantics to see how those tokens are interpreted. In this case, the string literal is treated as a sequence of characters, not as an *assignment-expression*. You can talk about the type an assignment-expression made from a string literal would have, but it's completely irrelevant to the actual code, because the actual semantics in effect here do not involve an assignment-expression, they treat each character in the string literal as a separate initializer. – Ben Voigt Jul 25 '16 at 19:06
  • @Keith: You might as well ask "Why would a braced-list not have a type because of its context?" Because in `extern int x,y; a = { x, y };` there is a value of type `std::initializer_list`. But in `extern int x,y; int a[2] = { x, y };` there is no `initializer_list`, you can't talk about the type of `{ x, y }` at all -- because of the context. – Ben Voigt Jul 25 '16 at 19:08
  • @BenVoigt: Because `{ x, y }` is not grammatically an expression in that context. A string literal is. – Keith Thompson Jul 25 '16 at 19:44
  • @KeithThompson: Please point out the grammar rule that makes `{ x, y }` define an expression of type `std::initializer_list` in `extern int x, y; auto a = { x, y };` and not in `extern int x, y; int a[2] = { x, y };` The initializer *grammar* makes no special case for `auto`, while the semantic rules do. – Ben Voigt Jul 25 '16 at 19:50
  • @BenVoigt: `0` is a null pointer constant. 4.10 explicitly acknowledges that is of integer type (in this case `int`). It's *converted* from `int` to some pointer type. – Keith Thompson Jul 25 '16 at 20:02
  • @Keith: Right, it's the conversion rules which depend it being a literal, not the initialization rules. It still breaks the type system's neat layering and goes to the source token stream, not the production identified by grammar. The grammar helps the semantic ruleset, it doesn't control it. – Ben Voigt Jul 25 '16 at 20:05
  • **2.14.15/8 String Literals [lex.string]:** ... a narrow string literal has type "`array of n const char`" ... and has static storage duration ... Also, 8.5.2 talks about "appropriately-typed string literal[s]". –  Jul 26 '16 at 00:28
0

I think the definition you cite has to be interpreted as referring to string literals whose storage location is not explicitly declared, such as the format expression in a printf(). In order for such code to work, those string literals have to be stored somewhere; the definition specifies where they are stored if that cannot be inferred from context.

On a side note: The string literal in your main() doesn't appear as static data because variables declared in functions are 'automatic' by default. If you had instead written static char cstring[]=... then you would have seen it in the same place as cstring1[].

And another thing: Storage location IS NOT part of the data type!

Ivan Starostin
  • 8,798
  • 5
  • 21
  • 39
PMar
  • 1