14

I was recently bitten by a subtle bug.

char ** int2str = {
   "zero", // 0
   "one",  // 1
   "two"   // 2
   "three",// 3
   nullptr };

assert( int2str[1] == std::string("one") ); // passes
assert( int2str[2] == std::string("two") ); // fails

If you have godlike code review powers, you'll notice I forgot the , after "two".

After the considerable effort to find that bug, I've got to ask why would anyone ever want this behavior?

I can see how this might be useful for macro magic, but then why is this a "feature" in a modern language like Python?

Have you ever used string literal concatenation in production code?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
deft_code
  • 57,255
  • 29
  • 141
  • 224

10 Answers10

23

Sure, it's the easy way to make your code look good:

char *someGlobalString = "very long "
                         "so broken "
                         "onto multiple "
                         "lines";

The best reason, though, is for weird printf formats, like type forcing:

uint64_t num = 5;
printf("Here is a number:  %"PRIX64", what do you think of that?", num);

There are a bunch of those defined, and they can come in handy if you have type size requirements. Check them all out at this link. A few examples:

PRIo8 PRIoLEAST16 PRIoFAST32 PRIoMAX PRIoPTR
Carl Norum
  • 219,201
  • 40
  • 422
  • 469
17

It's a great feature that allows you to combine preprocessor strings with your strings.

// Here we define the correct printf modifier for time_t
#ifdef TIME_T_LONG
    #define TIME_T_MOD "l"
#elif defined(TIME_T_LONG_LONG)
    #define TIME_T_MOD "ll"
#else
    #define TIME_T_MOD ""
#endif

// And he we merge the modifier into the rest of our format string
printf("time is %" TIME_T_MOD "u\n", time(0));
R Samuel Klatchko
  • 74,869
  • 16
  • 134
  • 187
  • +1, that's the best technical reason. The system defines several of those types of things as well - my answer has an example. – Carl Norum Mar 24 '10 at 00:13
  • The best technical reason, if you ignore the fact that you really shouldn't be using the preprocessor to do this sort of thing in the first place... –  Mar 24 '10 at 00:15
  • 1
    @STingRaySC, what about `PRIx32` or `PRIuLEAST32` and friends? http://www.opengroup.org/onlinepubs/9699919799/basedefs/inttypes.h.html – Carl Norum Mar 24 '10 at 00:19
  • 2
    @STingRaySC - while I agree that there are better way to do this in C++, his question is also tagged as C (where this is very useful). – R Samuel Klatchko Mar 24 '10 at 00:20
  • @Carl: Of course, if you're going to use those. But that doesn't mean that because the library author decided to use the preprocessor, it's a good decision. @R Samuel: In this case, it is not necessary even in C. There is no need for those string-literals to be compile-time constants. –  Mar 24 '10 at 00:26
  • @STingRaySC - yes, there is; look at how printf is implemented, particularly on tiny embedded targets. – Charles Duffy Mar 24 '10 at 00:31
  • @STingRaySC, what do you mean "library author"? It's part of the standard. Well, C99 anyway. Section 7.8.1. – Carl Norum Mar 24 '10 at 00:32
  • @Carl: Cripes... part of the standard **library**, no? Someone *authored* it, no? –  Mar 24 '10 at 00:34
  • 2
    There is a *very* good reason for `printf` (and friends') format strings to be compile-time constants - the compiler can tell you if your argument types don't match the format strings. – caf Mar 24 '10 at 00:36
  • @Charles: I don't understand your comment. `printf` requires compile-time constant arguments on some platforms? I doubt it. –  Mar 24 '10 at 00:36
  • 1
    @STingRaySC: it might not be necessary, but it's a common use. I'd like to see a pointer to a simple example of an alternative solution to this problem that doesn't use the preprocessor for comparison. – Michael Burr Mar 24 '10 at 00:42
  • @STingRaySC - no, it's not necessary, but it's much easier to write this way then to merge at runtime. – R Samuel Klatchko Mar 24 '10 at 00:45
  • @R Samuel: Much easier? `printf("time is %" + time_t_mod + "u\n", time(0));` -- you're right... that was tough! –  Mar 24 '10 at 00:47
  • @STingRaySC: 2 problems with what you suggested: 1) it won't work in C, and 2) you won't get compile time checking that some compilers provide (as mentioned by caf). Not to mention, the preprocessor version is really just as readable and maintainable - there's very little difference. – Michael Burr Mar 24 '10 at 01:02
  • @Michael: You're right. I forgot you can't do that in C. I concede. But, since the question is aimed at "modern" languages in general, I strongly disagree that this is a good answer, as it addresses an esoteric, archaic usage of string literal concatenation. I think the ability to split string literals across lines is the most relevant answer... –  Mar 24 '10 at 01:10
  • 2
    @STingRaySC - rather than being implemented as a single library call, printf can get optimized down to a series of calls specific to the formats included in that string -- which is why it takes only a constant string for its first argument! If you're compiling for a tiny embedded platform, not needing to have a do-it-all print-everything function with tons of code you'll never use linked in can be a huge win (and do remember that embedded space is one of the markets C still dominates, so there are lots of folks this is important to). – Charles Duffy Mar 24 '10 at 01:21
5

I see several C and C++ answers but none of the really answer why or really what was the rationale for this feature? In C++ this is feature comes from C99 and we can find the rationale for this feature by going to Rationale for International Standard—Programming Languages—C section 6.4.5 String literals which says (emphasis mine):

A string can be continued across multiple lines by using the backslash–newline line continuation, but this requires that the continuation of the string start in the first position of the next line. To permit more flexible layout, and to solve some preprocessing problems (see §6.10.3), the C89 Committee introduced string literal concatenation. Two string literals in a row are pasted together, with no null character in the middle, to make one combined string literal. This addition to the C language allows a programmer to extend a string literal beyond the end of a physical line without having to use the backslash–newline mechanism and thereby destroying the indentation scheme of the program. An explicit concatenation operator was not introduced because the concatenation is a lexical construct rather than a run-time operation.

Python which seems to have the same reason, this reduces the need for ugly \ to continue long string literals. Which is covered in section 2.4.2 String literal concatenation of the The Python Language Reference.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • This seems to be the real reason _An explicit concatenation operator was not introduced because the concatenation is a lexical construct rather than a run-time operation_. Everyone else just followed suit. – deft_code Jul 10 '14 at 20:08
  • This is extremely useful if you ever need to represent large amounts of text in C literals. For example, a lengthy "usage" message for CLI programs or if you're unfortunate enough to be writing CGI programs in C. – Brian McFarland Oct 13 '15 at 17:08
5

Cases where this can be useful:

  • Generating strings including components defined by the preprocessor (this is perhaps the largest use case in C, and it's one I see very, very frequently).
  • Splitting string constants over multiple lines

To provide a more concrete example for the former:

// in version.h
#define MYPROG_NAME "FOO"
#define MYPROG_VERSION "0.1.2"

// in main.c
puts("Welcome to " MYPROG_NAME " version " MYPROG_VERSION ".");
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
3

I'm not sure about other programming languages, but for example C# doesn't allow you to do this (and I think this is a good thing). As far as I can tell, most of the examples that show why this is useful in C++ would still work if you could use some special operator for string concatenation:

string someGlobalString = "very long " +
                          "so broken " +
                          "onto multiple " +
                          "lines"; 

This may not be as comfortable, but it is certainly safer. In your motivating example, the code would be invalid unless you added either , to separate elements or + to concatenate strings...

Tomas Petricek
  • 240,744
  • 19
  • 378
  • 553
  • That would not be valid. At least one of those strings would have to have a cast to std::string before that would compile. Also, the question is tagged with C. – Billy ONeal Mar 24 '10 at 01:14
  • @BillyONeal: The question is tagged with Python/C++ and it asks why "modern languages such as Python" allow this, so I thought I would post one counter-example. And I wanted to show that you don't need the feature (in general) to support things like line-breaks and macro expansion. – Tomas Petricek Mar 24 '10 at 01:50
  • This is a useful answer - it shows why it's (IMO) really a misfeature in Python that causes problems and isn't actually necessary. – Ken Williams Jun 08 '21 at 17:31
  • Why on earth would you bother doing this in c# instead of @"very long so broken onto multiple lines"; or take it a step further to allow interpolation and do $@"very long so broken onto multiple lines"? My way allows users to copy their SQL directly from SSMS into an empty string and forget about it. No going through adding line by line, etc. – Krausladen Jul 13 '22 at 14:26
  • Took me some time to understand what this 12 year old question was about :) Of course you would probably not do this in C# today (unless you wanted to avoid having newlines in your strings?) but the question was, why can you do this in C++ without explicit `+` - which is something C# has eliminated. – Tomas Petricek Jul 14 '22 at 15:39
3

From the Python lexical analysis reference, section 2.4.2:

This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add comments to parts of strings

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
2

For rationale, expanding and simplifying Shafik Yaghmour’s answer: string literal concatenation originated in C (hence inherited by C++), as did the term, for two reasons (references are from Rationale for the ANSI C Programming Language):

  • For formatting: to allow long string literals to span multiple lines with proper indentation – in contrast to line continuation, which destroys the indentation scheme (3.1.4 String literals); and
  • For macro magic: to allow the construction of string literals by macros (via stringizing) (3.8.3.2 The # operator).

It is included in the modern languages Python and D because they copied it from C, though in both of these it has been proposed for deprecation, as it is bug-prone (as you note) and unnecessary (since one can just have a concatenation operator and constant folding for compile-time evaluation; you can’t do this in C because strings are pointers, and so you can’t add them).

It’s not simple to remove because that breaks compatibility, and you have to be careful about precedence (implicit concatenation happens during lexing, prior to operators, but replacing this with an operator means you need to be careful about precedence), hence why it’s still present.

Yes, it is in used production code. Google Python Style Guide: Line length specifies:

When a literal string won't fit on a single line, use parentheses for implicit line joining.

x = ('This will build a very long long '
     'long long long long long long string')

See “String literal concatenation” at Wikipedia for more details and references.

Nils von Barth
  • 3,239
  • 2
  • 26
  • 27
  • Great point that constant folding removes all of the advantage. This could even work in C/C++ if limited to only string literals. – deft_code Jul 10 '14 at 20:14
  • Thanks! True that string literals could be special-cased, though that would also be a hack, and confusing in its own way: why does `"foo" + "bar"` work but `string s = "bar"; "foo" + s` not? I think the reasoning is that string literals are decidedly of type `char []` (C)/`const char []` (C++). However, it *does* work in C++14 with the new string standard literal (with `s` suffix): `"foo"s + "bar"s` is legit, and subject to folding. – Nils von Barth Nov 11 '14 at 06:23
2

So that you can split long string literals across lines.

And yes, I've seen it in production code.

1

While people have taken the words out of my mouth about the practical uses of the feature, nobody has so far tried to defend the choice of syntax.

For all I know, the typo that can slip through as a result was probably just overlooked. After all, it seems robustness against typos wasn't at the front of Dennis's mind, as shown further by:

if (a = b);
{
    printf("%d", a);
}

Furthermore, there's the possible view that it wasn't worth using up an extra symbol for concatenation of string literals—after all, there isn't much else that can be done with two of them, and having a symbol there might create temptation to try to use it for runtime string concatenation, which is above the level of C's built-in features.

Some modern, higher-level languages based on C syntax have discarded this notation presumably because it is typo-prone. But these languages have an operator for string concatenation, such as + (JavaScript and C#), . (Perl and PHP), ~ (D, though this has also kept C's juxtaposition syntax), and constant folding (in compiled languages, anyway) means that there isn't any runtime performance overhead.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Stewart
  • 3,935
  • 4
  • 27
  • 36
  • [VB.NET](https://en.wikipedia.org/wiki/Visual_Basic_.NET)'s (and others) operator for string concatenation is `&`. – Peter Mortensen May 21 '23 at 11:13
  • @PeterMortensen VB.NET isn't a language based on C syntax, but true. I believe VB also supports `+` for this. Excel also uses `&`. – Stewart May 22 '23 at 15:06
-2

Another sneaky error I've seen in the wild is people presuming that two single quotes are a way to escape the quote (as it is commonly used for double quotes in CSV files, for example), so they'll write things like the following in Python:

print('Beggars can''t be choosers')

which outputs Beggars cant be choosers instead of the Beggars can't be choosers the coder desired.

As for the original "why" question: why is this a "feature" in a modern language like Python?—in my opinion, I concur with the OP; it shouldn't be.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Ken Williams
  • 22,756
  • 10
  • 85
  • 147