30

Both GCC and Clang do not complain if I assign a string literal to a char*, even when using lots of pedantic options (-Wall -W -pedantic -std=c99):

char *foo = "bar";

while they (of course) do complain if I assign a const char* to a char*.

Does this mean that string literals are considered to be of char* type? Shouldn't they be const char*? It's not defined behavior if they get modified!

And (an uncorrelated question) what about command line parameters (ie: argv): is it considered to be an array of string literals?

moinudin
  • 134,091
  • 45
  • 190
  • 216
peoro
  • 25,562
  • 20
  • 98
  • 150

8 Answers8

26

They are of type char[N] where N is the number of characters including the terminating \0. So yes you can assign them to char*, but you still cannot write to them (the effect will be undefined).

Wrt argv: It points to an array of pointers to strings. Those strings are explicitly modifiable. You can change them and they are required to hold the last stored value.

Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • 8
    +1 -- note that that is their type, but you must treat them as if they were `const` (or the undefined behavior/access violation fairy will pay you a visit) – Billy ONeal Dec 20 '10 at 19:34
  • 3
    @Billy: don't you see this as somehow inconsistent? Otherwise the OP wouldn't need to ask the question at all. – Vlad Dec 20 '10 at 19:36
  • @Billy: IMHO the compatibility reasons don't justify such a "feature". People keep getting crashes when code is compiled on a different platform, without a visible warning. – Vlad Dec 20 '10 at 19:41
  • 3
    @Vlad: I didn't write the standard(s) :P – Billy ONeal Dec 20 '10 at 19:42
  • @Billy: I've got a faint hope that the C++ standard committee members read StackOverflow. – Vlad Dec 20 '10 at 19:45
  • 1
    @Vlad the c++ committee already banned the string literal to `char*` conversion. It's becoming ill-formed to do such a thing in C++0x. – Johannes Schaub - litb Dec 20 '10 at 19:47
  • @Vlad: You'd have to convince the C standards committee to do it too (After all, the change made in C++ was mirrored when C99 added `const` to C) – Billy ONeal Dec 20 '10 at 19:53
  • 6
    C had `const` since C89. It wasn't added to C99. – Johannes Schaub - litb Dec 20 '10 at 20:13
  • They appear to be somewhere in between `char [N]` and `const char[N]`. See the experiments in the answer I've just added here. (Perhaps this is something that has changed in C++11?) – Aaron McDaid Oct 29 '13 at 21:10
  • (Sorry, I've only just noticed this question is tagged as `c`, not `c++`. Maybe my answer isn't so relevant to this question after all!) – Aaron McDaid Oct 29 '13 at 21:24
10

For completeness sake the C99 draft standard(C89 and C11 have similar wording) in section 6.4.5 String literals paragraph 5 says:

[...]a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence;[...]

So this says a string literal has static storage duration(lasts the lifetime of the program) and it's type is char[](not char *) and its length is the size of the string literal with an appended zero. *Paragraph 6` says:

If the program attempts to modify such an array, the behavior is undefined.

So attempting to modify a string literal is undefined behavior regardless of the fact that they are not const.

With respect to argv in section 5.1.2.2.1 Program startup paragraph 2 says:

If they are declared, the parameters to the main function shall obey the following constraints:

[...]

-The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

So argv is not considered an array of string literals and it is ok to modify the contents of argv.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
6

Using -Wwrite-strings option you will get:

warning: initialization discards qualifiers from pointer target type

Irrespective of that option, GCC will put literals into read-only memory section, unless told otherwise by using -fwritable-strings (however this option has been removed from recent GCC versions).

Command line parameters are not const, they typically live on the stack.

msc
  • 33,420
  • 29
  • 119
  • 214
Jester
  • 56,577
  • 4
  • 81
  • 125
5

(Sorry, I've only just noticed this question is tagged as c, not c++. Maybe my answer isn't so relevant to this question after all!)

String literals are not quite const or not-const, there is a special strange rule for literals.

(Summary: Literals can be taken by reference-to-array as foo( const char (&)[N]) and cannot be taken as the non-const array. They prefer to decay to const char *. So far, that makes it seem like they are const. But there is a special legacy rule which allows literals to decay to char *. See experiments below.)

(Following experiments done on clang3.3 with -std=gnu++0x. Perhaps this is a C++11 issue? Or specific to clang? Either way, there is something strange going on.)

At first, literals appears to be const:

void foo( const char  * ) { std::cout << "const char *" << std::endl; }
void foo(       char  * ) { std::cout << "      char *" << std::endl; }

int main() {
        const char arr_cc[3] = "hi";
        char arr_c[3] = "hi";

        foo(arr_cc); // const char *
        foo(arr_c);  //       char *
        foo("hi");   // const char *
}

The two arrays behave as expected, demonstrating that foo is able to tell us whether the pointer is const or not. Then "hi" selects the const version of foo. So it seems like that settles it: literals are const ... aren't they?

But, if you remove void foo( const char * ) then it gets strange. First, the call to foo(arr_c) fails with an error at compile time. That is expected. But the literal call (foo("hi")) works via the non-const call.

So, literals are "more const" than arr_c (because they prefer to decay to the const char *, unlike arr_c. But literals are "less const" than arr_cc because they are willing to decay to char * if needed.

(Clang gives a warning when it decays to char *).

But what about the decaying? Let's avoid it for simplicity.

Let's take the arrays by reference into foo instead. This gives us more 'intuitive' results:

void foo( const char  (&)[3] ) { std::cout << "const char (&)[3]" << std::endl; }
void foo(       char  (&)[3] ) { std::cout << "      char (&)[3]" << std::endl; }

As before, the literal and the const array (arr_cc) use the const version, and the non-const version is used by arr_c. And if we delete foo( const char (&)[3] ), then we get errors with both foo(arr_cc); and foo("hi");. In short, if we avoid the pointer-decay and use reference-to-array instead, literals behave as if they are const.

Templates?

In templates, the system will deduce const char * instead of char * and you're "stuck" with that.

template<typename T>
void bar(T *t) { // will deduce   const char   when a literal is supplied
    foo(t);
}

So basically, a literal behaves as const at all times, except in the particular case where you directly initialize a char * with a literal.

Aaron McDaid
  • 26,501
  • 9
  • 66
  • 88
2

In both C89 and C99, string literals are of type char * (for historical reasons, as I understand it). You are correct that trying to modify one results in undefined behavior. GCC has a specific warning flag, -Wwrite-strings (which is not part of -Wall), that will warn you if you try to do so.

As for argv, the arguments are copied into your program's address space, and can safely be modified in your main() function.

EDIT: Whoops, had -Wno-write-strings copied by accident. Updated with the correct (positive) form of the warning flag.

Justin Spahr-Summers
  • 16,893
  • 2
  • 61
  • 79
  • 3
    String literals are array types (`char [N]`), not pointer types (`char *`); refer to **6.4.5 String Literals** in n1256. – John Bode Dec 20 '10 at 22:16
2

Johannes' answer is correct concerning the type and contents. But in addition to that, yes, it is undefined behavior to modify contents of a string literal.

Concerning your question about argv:

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
2

String literals have formal type char [] but semantic type const char []. The purists hate it but this is generally useful and harmless, except for bringing lots of newbies to SO with "WHY IS MY PROGRAM CRASHING?!?!" questions.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 1
    Makes me curious, do you have an example for this being useful? I mean beyond this backwards compatibility idea, which I guess is less and less relevant? – Jens Gustedt Dec 20 '10 at 22:42
  • 2
    Sometimes you want a `char *` instead of a `const char *` for iterating over a string, especially if you'll be passing a pointer to it to a function like `strtol`. It's less a matter of backwards compatibility and more a matter of `const` sometimes being a pain. Of course most of the time in cases like this I have to deal with non-literal strings too and just cast away the `const`... – R.. GitHub STOP HELPING ICE Dec 21 '10 at 02:52
  • @R.. I don't see the issue with iterating over a `const char *` or even passing it to `strtol()`. It's not a `const char const *` after all. – Morty Sep 14 '17 at 14:03
  • @morty: `strtol` takes `char **`, not `const char **`, for its `endptr` argument. Likewise for related functions. If you have a `const char *p`, you can't pass `&p` to `strtol`; instead you have to copy the value to a `char *tmp` (using a cast to remove the `const` from the pointed-to type), pass `&tmp`, and then copy the result back. – R.. GitHub STOP HELPING ICE Sep 14 '17 at 18:31
  • In particular note that you **cannot** pass `(char **)&p` here; doing so results in undefined behavior (aliasing violation). – R.. GitHub STOP HELPING ICE Sep 14 '17 at 18:34
  • @R..: Point taken. But I'm not sure about the undefined behavior. Before we get too off topic, I'll just add some references: https://stackoverflow.com/questions/34767233/how-do-you-implement-strtol-under-const-correctness https://stackoverflow.com/questions/993700/are-strtol-strtod-unsafe – Morty Sep 19 '17 at 06:46
  • Having string literal non-constant is helpful when calling functions where a programmer didn't declare a parameter const when it could have been. I expect most libraries have corrected this issue by now, but the language definition won't change for fear of breaking legacy code. C++ does make string literals constant. – Preston Crow Mar 09 '18 at 18:01
1

They are const char*, but there is a specific exclusion for assigning them to char* for legacy code that existed before const did. And the command line arguments are definitely not literal, they are created at run-time.

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • Not sure why you got downvotes. I've done some experiments to confirm that they are not quite const or non-const. I'll post my own answer. – Aaron McDaid Oct 29 '13 at 20:36
  • This is incorrect. They are not const in C, but are in C++. They aren't generally writable as noted by others, but they don't behave as constants. There are some weird things you can't do, like 'case "abc"[2]:' in C, but you can in C++ where string literals are constant. This can make a big difference with some macros. – Preston Crow Mar 09 '18 at 17:57