20

As far as I can tell, before C++11, string literals were handled in almost exactly the same way between C and C++.

Now, I acknowledge that there are differences between C and C++ in the handling of wide string literals.

The only differences that I have been able to find are in the initialization of an array by string literal.

char str[3] = "abc"; /* OK in C but not in C++ */
char str[4] = "abc"; /* OK in C and in C++. Terminating zero at str[3] */

And a technical difference that only matters in C++. In C++ "abc" is const char [4] while in C it is char [4]. However, C++ has a special rule that allows the conversion to const char * and then to char * to retain C compatibility up until C++11 when that special rule is no longer applied.

And a difference in allowed lengths of literals. However, as a practical matter any compiler that compiles both C and C++ code will not enforce the lower C limit.

I have some interesting links that apply:

Are there any other differences?

Zan Lynx
  • 53,022
  • 10
  • 79
  • 131
  • 2
    In the context of our other discussion, `const char[N]` vs `char[N]` is a huge difference. The rule that forbids modification of string literals in C++ is the rule that forbids modification of `const` objects. You won't find any C++ special case, like the C rule specifically forbidding writing into memory where string literals are stored. – Ben Voigt Apr 18 '14 at 01:03
  • @BenVoigt: That's all you have? From my point of view the C special rule that char [4] isn't really writable and the C++ rule that const char [4] converts to char* but really isn't writable have *the same result* and aren't a difference at all. – Zan Lynx Apr 18 '14 at 01:06
  • 2
    The code `char str[4] = "abc";` is not assignment, it's initialization. – Yu Hao Apr 18 '14 at 01:06
  • C and C++ use the same rules for string literals, but C adds two extras for backwards-compatibility: The type implicitly decays to `char*` even though the object is a constant literal, and One can initialise an array which can hold all but the terminator with a string literal. – Deduplicator Apr 18 '14 at 01:07
  • 1
    Weirdly, C++ does have the same rule. 2.14.5p12. But it's redundant with the `const` type. – Ben Voigt Apr 18 '14 at 01:07
  • @YuHao: Fixed that with an edit. – Zan Lynx Apr 18 '14 at 01:07
  • 1
    @Deduplicator: In C99, the literal type is `char [N]` as Zan said in the question. – Ben Voigt Apr 18 '14 at 01:07
  • In standard C++, string literals are constant and have type `const char[]` – FoggyDay Apr 18 '14 at 01:10
  • @BenVoigt: As I said, a constant literal whose type does not reflect the const. (Being a const literal and having no identity is important for constant pooling) – Deduplicator Apr 18 '14 at 01:18
  • 1
    @Zan: Being `const` in C++ has a lot more effects than just an alternate way to state the C rule that they can't be modified. At least, I think it's supposed to have implications for integral constant expressions, and usability with `constexpr` initialization. – Ben Voigt Apr 18 '14 at 01:56
  • C++(11) also provides raw string literals (but you probably knew that already). – user657267 Apr 18 '14 at 01:58
  • Also universal character names are considered an escape sequence in C, but in C++ they are regular c-chars. Pedantic but that's all I could find. – user657267 Apr 18 '14 at 02:05
  • @BenVoigt: `const` does make a difference within C++ which I think I acknowledge, but between C and C++ I don't think it does, because while the terminology may be a little different the results are exactly the same. – Zan Lynx Apr 18 '14 at 02:15
  • I don't have the C standards available, but I rather doubt that C can support user defined literals. Probably not even Unicode literals (u and U prefixes, I think C does support L prefix). – Cheers and hth. - Alf Apr 18 '14 at 02:59
  • May be irrelevant, but string concatenation for wide strings literals in C++11 is different from C. for example `L"Hello, " "world"` is invalid in C but valid in C++. – Mohit Jain May 14 '14 at 10:54
  • @Mohit Jain: It's not the same case as the OP's shown. It's invalid in C because there is no rule to convert from string literal to wide string literal automatically, not because you can't concatenate. – xryl669 May 21 '14 at 11:33
  • Also, the "const" part means that the array of byte for the string is in a read-only / shared data section of the final binary in usual OS. If you try to write to that part, you'll get a SIGBUS/SIGFAULT on Posix system, and an access violation on Win32 platform. – xryl669 May 21 '14 at 11:35
  • 5
    When you write "char x[4] = "abc";" in fact you are making a copy from a (readonly section) const array to a (stack based) non-const array. It's semantically equivalent to "memcpy(x, "abc", 4);". Hopefully the compiler checks for the size at compile time and prevent doing overflow. – xryl669 May 21 '14 at 11:45
  • 4
    This seems much more of a question about initializing `char` arrays, less of a question about string literals – tenfour May 21 '14 at 13:35
  • @BenVoigt. Yeah, sorry. Missed. :D – Shoe May 22 '14 at 06:59

2 Answers2

10

Raw strings

A noticeable difference is that C++'s string literals are a superset of C ones. Specifically C++ now supports raw strings (not supported in C), technically defined at §2.14.15 and generally used in HTML and XML where " is often encountered.

Raw strings allow you to specify your own delimiter (up to 16 characters) in the form:

R"delimiter(char sequence)delimiter"

This is particularly useful to avoid unnecessary escaping characters by providing your own string delimiter. The following two examples show how you can avoid escaping of " and ( respectively:

std::cout << R"(a"b"c")";      // empty delimiter
std::cout << '\n';
std::cout << R"aa(a("b"))aa";  // aa delimiter
// a"b"c"
// a("b")

Live demo


char vs const char

Another difference, pointed out in the comments, is that string literals have type char [n] in C, as specified at §6.4.5/6:

For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.

while in C++ they have type const char [n], as defined in §2.14.5/8:

Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration (3.7).

This doesn't change the fact that in both standard (at §6.4.5/7 and 2.14.5/13 for C and C++ respectively) attempting to modify a string literal results in undefined behavior.


Unspecified vs Implementation defined (ref)

Another subtle difference is that in C, wether the character arrays of string literals are different is unspecified, as per §6.4.5/7:

It is unspecified whether these arrays are distinct provided their elements have the appropriate values.

while in C++ this is implementation defined, as per §2.14.5/13:

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation- defined.

Community
  • 1
  • 1
Shoe
  • 74,840
  • 36
  • 166
  • 272
  • Could you elaborate on practical difference between _unspecified_ and _implementation-defined_? Sure there is some, or there wouldn't be two different wordings... – rodrigo May 22 '14 at 07:31
  • 1
    @rodrigo, The technical difference, if I understand the wording correctly, is that when *undefined*, the implementation can choose any of the possibilities and is not required to document it; while when it's *implementation* defined it is required to provide documentation regarding the choice it made. Anyway, right next to the third title there's a reference link to one of the questions here on SO regarding that difference. :) – Shoe May 22 '14 at 07:42
-1

The best way to answer your question is to rewrite it as a Program that compiles identically when using a "C" or "C++" Compiler, I will assume you are using GCC but other (correctly written) Compiler Toolchains should provide similar results.

First I will address each point you posed then I will give a Program that provides the answer (and Proof).

  • As far as I can tell, before C++11, string literals were handled in almost exactly the same way between C and C++.

They still can be handled the same way using various Command Line Parameters, in this example I will use "-fpermissive" (a Cheat). You are better off finding out why you are getting Warnings and writing NEW Code to avoid ANY Warning; only use CLP 'cheats' to compile OLD Code.

Write new Code correctly (no cheats and no Warnings, that there be no Errors goes without saying).

  • Now, I acknowledge that there are differences between C and C++ in the handling of wide string literals.

There does not have to be (many differences) since you can cheat most or all of them away depending on the circumstances. Cheating is wrong, learn to program correctly and follow modern Standards not the mistakes (or awkwardness) of the past. Things are done a certain way to be helpful both to you, and to the Compiler in some cases (remember YOU are not the only one who 'sees' your Code).

In this case the Compiler wants enough space allocated to terminate the String with a '0' (zero byte). That permits the use of a print (and some other) Function without specifying the length of the String.

IF you are simply trying to compile an existing Program you obtained from somewhere and do not want to re-write it, you simply want to compile it and run it, then use the cheats (if you must) to get past the Warnings and force the compilation to an executable.

  • The rest of what you wrote ...

No.

.

See this example Program. I slightly modified your question to make it into a Program. The result of compiling this Program with a "C" or C++" Compiler is identical.

Copy-and-Paste the example Program text below to a File called "test.c", then follow the instructions at the start. Simply 'cat' the File so you can backscroll it (and see it) without opening a Text Editor, then Copy-and-Paste each Line beginning with the Compiler Commands (the next three).

Note, that as pointed out in the Comments, that running this Line "g++ -S -o test_c++.s test.c" produces an Error (using a modern g++ Compiler) since the container is not long enough to hold the String.

You should be able to read this Program and not actually need to compile it to see the Answer but it will compile and produce the Output for you to examine should you desire to do so.

As you can see the Varable "str1" is not long enough to hold the String when it is null terminated, that produces an Error on a modern (and correctly written) g++ Compiler.


/* Answer for: http://stackoverflow.com/questions/23145793/string-literal-differences-between-c-and-c
 *
 * cat test.c
 * gcc -S -o test_c.s test.c
 * g++ -S -o test_c++.s test.c
 * g++ -S -fpermissive -o test_c++.s test.c
 *
 */

char str1[3] = "1ab";
char str2[4] = "2ab";
char str3[]  = "3ab";

main(){return 0;}


/* Comment: Executing "g++ -S -o test_c++.s test.c" produces this Error:
 *
 * test.c:10:16: error: initializer-string for array of chars is too long [-fpermissive]
 * char str1[3] = "1ab";
 *                ^
 *
 */


/* Resulting Assembly Language Output */

/*      .file   "test.c"
 *      .globl  _str1
 *      .data
 * _str1:
 *      .ascii "1ab"
 *      .globl  _str2
 * _str2:
 *      .ascii "2ab\0"
 *      .globl  _str3
 * _str3:
 *      .ascii "3ab\0"
 *      .def    ___main;    .scl    2;  .type   32; .endef
 *      .text
 *      .globl  _main
 *      .def    _main;  .scl    2;  .type   32; .endef
 * _main:
 * LFB0:
 *      .cfi_startproc
 *      pushl   %ebp
 *      .cfi_def_cfa_offset 8
 *      .cfi_offset 5, -8
 *      movl    %esp, %ebp
 *      .cfi_def_cfa_register 5
 *      andl    $-16, %esp
 *      call    ___main
 *      movl    $0, %eax
 *      leave
 *      .cfi_restore 5
 *      .cfi_def_cfa 4, 4
 *      ret
 *      .cfi_endproc
 * LFE0:
 *      .ident  "GCC: (GNU) 4.8.2"
 *
 */
Rob
  • 1,487
  • 2
  • 25
  • 29
  • 4
    Nice explanation. But sadly you are missing the topic. – dhein May 22 '14 at 06:55
  • I disagree, I did answer the question exactly. – Rob May 22 '14 at 07:07
  • 2
    The OP asked for additional differences to them he stated. ou are jsut Explaining how he can proof what he allready knows as he said. So If you would ask me thats missing the topic. – dhein May 22 '14 at 07:30
  • The OP limited the Scope of his question to "string literals" and did not expand his question to every possible usage of a String (IE: new Functions or changes to old Functions that cause Strings to be handled differently OR require the Literals to be different than I described). So I did say "no", perhaps not "literally" enough for you, save for what I described. Thank you for taking the time to explain why you disagreed with my answer. I used to provide longer answers but found they were being edited for brevity so I tried to avoid excess verbosity since then. – Rob May 24 '14 at 17:57