79

C++03 5.1 Primary expressions §2 says:

A literal is a primary expression. Its type depends on its form (2.13). A string literal is an lvalue; all other literals are rvalues.

Similarly, C99 6.5.1 §4 says:

A string literal is a primary expression. It is an lvalue with type as detailed in 6.4.5.

What is the rationale behind this?

As I understand, string literals are objects, while all other literals are not. And an l-value always refers to an object.

But the question then is why are string literals objects while all other literals are not? This rationale seems to me more like an egg or chicken problem.

I understand the answer to this may be related to hardware architecture rather than C/C++ as programming languages, nevertheless I would like to hear the same.

Mark Amery
  • 143,130
  • 81
  • 406
  • 459
Alok Save
  • 202,538
  • 53
  • 430
  • 533
  • 3
    Lvalues are not objects. Lvalues are values which can appear on the left-hand side of an assignment, such as variables, members of structures, and array element lookups. (L = Left.) –  Apr 04 '12 at 03:26
  • 9
    @duskwuff: The Committee begs to differ. Per 6.3.2.1, "An lvalue is an expression with an object type or an incomplete type other than void; if an lvalue does not designate an object when it is evaluated, the behavior is undefined." Per the footnote (53) referenced in that citation, an lvalue should be thought of as an "object locator value". – R.. GitHub STOP HELPING ICE Apr 04 '12 at 03:31
  • 1
    @JohnCalsbeek C++11 'fixed' that, e.g. `alias {}` is possible now. `U {}.arr` is also an rvalue of array type if `arr` is declared as such in the class definition for `U`. – Luc Danton Apr 04 '12 at 03:50
  • 3
    BTW, a better approximation of lvalue is "syntactically valid operand of the `&` operator". I suspect that definition is actually equivalent to the standard's definition, unless I'm missing something... – R.. GitHub STOP HELPING ICE Apr 04 '12 at 04:10
  • 3
    Update: It is only approximate. Register-storage-class objects are not valid as operands of `&`, but are lvalues. Also, I'm rather unclear on why it's (presumably) invalid to apply `&` to the return value of a function, which is specified to have object type... – R.. GitHub STOP HELPING ICE Apr 04 '12 at 04:56
  • @r.. and in C, function designators are not lvalues. – Johannes Schaub - litb Apr 04 '12 at 09:35
  • @R.. Bit field members are objects but don't have their own address. – curiousguy Sep 28 '19 at 23:10
  • @curiousguy: Indeed, nor do `register` class. However I think (it's been a long time) I was trying to get at that with "*syntactically* valid". – R.. GitHub STOP HELPING ICE Sep 28 '19 at 23:28
  • In three sentences: strings occupy memory, you can take their address. `&"hello"` is valid C++. Hence, they should be l-values. – Ofek Shilon Feb 24 '20 at 11:36
  • @duskwuff-inactive- Aren't the names originally from assembly language? location value and register value? – Tammi Feb 01 '21 at 19:50

5 Answers5

42

A string literal is a literal with array type, and in C there is no way for an array type to exist in an expression except as an lvalue. String literals could have been specified to have pointer type (rather than array type that usually decays to a pointer) pointing to the string "contents", but this would make them rather less useful; in particular, the sizeof operator could not be applied to them.

Note that C99 introduced compound literals, which are also lvalues, so having a literal be an lvalue is no longer a special exception; it's closer to being the norm.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • Isn't `puts("hello")` an example of an expression with an array type that could be an rvalue? – Pubby Apr 04 '12 at 03:38
  • 2
    `puts("hello")` is an expression with type `int`. – R.. GitHub STOP HELPING ICE Apr 04 '12 at 03:40
  • I meant where `"hello"` is an rvalue. – Pubby Apr 04 '12 at 03:41
  • 4
    `"hello"` is not an rvalue. It's an lvalue array which decays to an expression of type pointer-to-`char`. – R.. GitHub STOP HELPING ICE Apr 04 '12 at 03:43
  • Yes, but you said *"no way for an array type to exist in an expression except as an lvalue."*. Wouldn't that code work if the literal was an rvalue? – Pubby Apr 04 '12 at 03:43
  • In that case it might, but what if you passed a string literal to a function that wanted to store it? You might expect the literal to go "out of scope" after the function call, so a copy would be required. – John Calsbeek Apr 04 '12 at 03:47
  • 12
    The literal can't have array type without being an lvalue, because of the way array decay to pointers works. If it did not have object type, there would be no address of its initial element for it to decay to. As my (slightly revised) answer states, the language *could have been designed* such that string literals are *originally* of pointer type, without any decay, and then they would not need to be lvalues. But that would be a lot less useful in practice. – R.. GitHub STOP HELPING ICE Apr 04 '12 at 03:48
  • 2
    It is possible to have rvalue array types - for example if you have `struct x { int a[2]; }; struct x foo(void);` then `foo().a` is an rvalue array. Also, given `struct x bar, quux;` then `(1 ? bar : quux).a` is an rvalue array. – caf Apr 04 '12 at 04:21
  • @caf: C does not define "rvalue", which is probably a good thing, because it's always unclear whether the intended meaning is "non-lvalue" or just "any expression value". Your examples are definitely lvalues per the definition of an lvalue ("an expression with an object type...") and 6.5.2.2, which reads [starting new comment]: – R.. GitHub STOP HELPING ICE Apr 04 '12 at 04:48
  • 2
    @R.. Could you comment on my answer below? There seems to be a strong view that I'm incorrect, but I think this may be a place where C and C++ differ. I'd like to check before I delete the answer :) – Timothy Jones Apr 04 '12 at 04:48
  • "If the expression that denotes the called function has type pointer to function returning an object type, the function call expression has the same type as that object type, and has the value determined as specified in 6.8.6.4. Otherwise, the function call has type void. If an attempt is made to modify the result of a function call or to access it after the next sequence point, the behavior is undefined." – R.. GitHub STOP HELPING ICE Apr 04 '12 at 04:49
  • 1
    @R.: That definition does not seem complete, because for example the expression `+1` has object type (`int`) but is not ordinarily considered an lvalue. Note that Example 1 in C99 §6.5.2.3 specifically calls out `f().x` as being *"a valid postfix expression but is not an lvalue"*. – caf Apr 04 '12 at 06:24
  • 1
    The (C) standard could have defined string literals as rvalues, and then added a number of special rules to make them work as they do. Defining them as lvalues eliminates the need for most of the special rules. (In C, there's still the special rule that they don't have a const type, but you're not allowed to modify them. In C++, the special rule is that they have a `const` type, but there is an implicit conversion which will remove the const. In both cases, these special rules only apply to string literals.) – James Kanze Apr 04 '12 at 07:42
  • 2
    @caf is right that there are array "rvalues" (or just plain values), due to `struct` return values. The standard is pretty weak in terms of describing what one can do with them, though. The big issue in implementations is that they may (or may not) be stored in registers (for sufficiently small structures) or similar "ephemeral" storage, and array manipulation—even something as simple as subscripting to extract one element—can overwrite this storage; but "normal" array access requires a fairly durable pointer to the base of the array. How long is that pointer valid? Who knows! – torek Apr 04 '12 at 07:58
  • @torek: If this is correct, then I believe subscripting them is illegal unless there's a special case allowing it. Even if there is, I see no reason the array would need to exist temporarily in memory... – R.. GitHub STOP HELPING ICE Apr 04 '12 at 11:41
  • The conclusions we drew, way back when, were that the only "truly safe" thing to do with a `struct`-valued function was either: `struct_instance = f(args);` or `(void) f(args);`. C99 tries to make it clear that you can also select a `struct` element and (subsequently) an array element, but not grab hold of a pointer to the entire array. This works right in gcc, but it's probably a good test for other compilers. (I'd guess the Plum-Hall test suite has a test like this by now.) – torek Apr 04 '12 at 18:04
  • Can you provide a citation where C99 tries to make it clear that this is allowed? – R.. GitHub STOP HELPING ICE Apr 04 '12 at 23:56
  • Also if it is not array type, template deduction of size of string literal is not possible – K.K Apr 02 '13 at 05:50
19

String literals are arrays - objects of inherently unpredictable size (i.e of user-defined and possibly large size). In general case, there's simply no other way to represent such literals except as objects in memory, i.e. as lvalues. In C99 this also applies to compound literals, which are also lvalues.

Any attempts to artificially hide the fact that string literals are lvalues at the language level would produce a considerable number of completely unnecessary difficulties, since the ability to point to a string literal with a pointer as well as the ability to access it as an array relies critically on its lvalue-ness being visible at the language level.

Meanwhile, literals of scalar types have fixed compile-time size. At the same time, such literals are very likely to be embedded directly into the machine commands on the given hardware architecture. For example, when you write something like i = i * 5 + 2, the literal values 5 and 2 become explicit (or even implicit) parts of the generated machine code. They don't exist and don't need to exist as standalone locations in data storage. There's simply no point in storing values 5 and 2 in the data memory.

It is also worth noting that on many (if not most, or all) hardware architectures floating-point literals are actually implemented as "hidden" lvalues (even though the language does not expose them as such). On platforms like x86 machine commands from floating-point group do not support embedded immediate operands. This means that virtually every floating-point literal has to be stored in (and read from) data memory by the compiler. E.g. when you write something like i = i * 5.5 + 2.1 it is translated into something like

const double unnamed_double_5_5 = 5.5;
const double unnamed_double_2_1 = 2.1;
i = i * unnamed_double_5_5 + unnamed_double_2_1;

In other words, floating-point literals often end up becoming "unofficial" lvalues internally. However, it makes perfect sense that language specification did not make any attempts to expose this implementation detail. At language level, arithmetic literals make more sense as rvalues.

Thomas Flinkow
  • 4,845
  • 5
  • 29
  • 65
AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • 1
    So expressions like `'x'` or `5` in the source code are "swallowed" in the executable during the compilation and "become part of it", whereas memory is reserved for `"x"` and `5.5` at runtime, so that they are created by the executable, stored in memory, but are not part of the executable file itself. Have I completely missed the point? – Enlico Nov 27 '18 at 17:05
  • 3
    Fun fact: `x * 2.0` will usually compile as `x+x`. That really emphasizes that the "hidden lvalue" thing is truly just an asm implementation detail, and not fundamental or even related to language rules. More of a fun fact, but yeah interesting to point out. (Although the as-if rule does even allow the compiler to modify string literals, e.g. turn `printf("hello\n")` into `puts("hello")`.) – Peter Cordes Feb 13 '19 at 14:13
  • @Enlico The following thread could be useful: https://stackoverflow.com/questions/2589949/string-literals-where-do-they-go Usually, string literals go into the read-only section of the object file. – Hari Apr 14 '23 at 08:27
12

I'd guess that the original motive was mainly a pragmatic one: a string literal must reside in memory and have an address. The type of a string literal is an array type (char[] in C, char const[] in C++), and array types convert to pointers in most contexts. The language could have found other ways to define this (e.g. a string literal could have pointer type to begin with, with special rules concerning what it pointed to), but just making the literal an lvalue is probably the easiest way of defining what is concretely needed.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • Why the down vote for what is almost certainly the correct answer? – James Kanze Apr 04 '12 at 07:47
  • Not my downvote. So if I understand your answer correctly, the committee just accepted what was probably suggested without delving in to whether it was the best possible approach, but just that it seemed more flexible to chose at the time? – Alok Save Apr 04 '12 at 08:25
  • For whatever it's worth, the C99 standard just took the text from the C89 standard, and in the C89 standardization process, as I recall (from reading minutes, I was never at any actual meetings) there was some minor argument about this but it never went anywhere. The big fiery arguments were about making string literals `const`. – torek Apr 04 '12 at 08:30
  • 1
    @Als Even before the committee, the specification of C has been strongly motivated by pragmatic considerations, rather than language theory or more abstract considerations. Esthetically, it would be more elegant if the all of the literal types were rvalues. Pragmatically, string literals have an array type, array types work differently than other types, and making them lvalues sorts things out with the least number of other special rules. – James Kanze Apr 04 '12 at 08:51
  • @torek IIRC, the distinction was already present in K&R C (1st edition), although my copy isn't handy to check with. Pragmatically, it's easier to say that they're lvalues than it is to write several paragraphs of special rules so that they can be rvalues, but still work as they do. Pragmatically, too, it's easier to say that they are non-const (but cannot be modified), than it is to define special conversion rules (a la C++) to avoid breaking code. K&R and the C committee have always been very pragmatic about things. – James Kanze Apr 04 '12 at 08:57
  • @JamesKanze: Alas, I lost my original-edition White Book some number of moves ago, so I can't check. The C89 committee had a lot of implementors on it though, hence `noalias`; Ritchie's "noalias must go" response was grounded in both pragmatics *and* theory (he demonstrated that "noalias" was self-inconsistent). – torek Apr 04 '12 at 09:04
  • @torek Richie is one of those exceptional people who could master both, and understood when each was appropriate. Such people are all too rare. – James Kanze Apr 04 '12 at 09:32
  • @JamesKanze: alas, "was". dmr migrated to great the 11/45-in-the-sky in October 2011. – torek Apr 04 '12 at 09:36
12

An lvalue in C++ does not always refer to an object. It can refer to a function too. Moreover, objects do not have to be referred to by lvalues. They may be referred to by rvalues, including for arrays (in C++ and C). However, in old C89, the array to pointer conversion did not apply for rvalues arrays.

Now, an rvalue denotes no, limited or soon to be an expired lifetime. A string literal, however, lives for the entire program.

So string literals being lvalues is exactly right.

Chen Li
  • 4,824
  • 3
  • 28
  • 55
Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
1

There is a lot of valuable information in the answers and the comments. A few points are worth highlighting.

Arrays can be rvalues. More information can be found here and here. For e.g., the following code involves an rvalue array:

template <typename T>
using alias = T;

int main() {
    return alias<int[]>{23, 37, 53}[1];
}

Thus, it is not good to reason about string literals being arrays for them to be lvalues.

It is good to remember that string literals last for the lifetime of the program. Even though value category is not lifetime, it makes sense to see why string literals are lvalues based on their lifetime.

Just like many discussions about value categories, string literals being lvalues is very much driven by pragmatic considerations about what has happened in the language development so far and what is the best that can be done from where we stand at that moment in time.

Hari
  • 1,561
  • 4
  • 17
  • 26