26

Is there a way to warn or prohibit literal string concatenations such as:

const char *a = "foo" " bar";

I spent hours finding a bug in a big static array that had

const char * a[] = {"foo" "bar"};

instead of

const char * a[] = {"foo", "bar"};
InSync
  • 4,851
  • 4
  • 8
  • 30
gozag
  • 271
  • 2
  • 6
  • 24
    I sympathize with your pain. We've probably all been there. But you wouldn't want to disallow string concatenation outright, because much code depends on it deliberately. – Steve Summit May 19 '23 at 11:44
  • 2
    I use this when I have very log string, so warning of such kind would be annoying thing for me. Since there is valid usage of this feature, I do not think there is easy way to automate detection of this typo. You should ensure code is well tested. – Marek R May 19 '23 at 11:47
  • 1
    @SteveSummit I'd use some macro to be more explicit with the string concatenation. – gozag May 19 '23 at 11:53
  • I was hoping `using namespace std::string_view_literals;` could address this issue, but I'm surprised it complies too without warnings: https://godbolt.org/z/fW6ee9eqP – Marek R May 19 '23 at 11:59
  • How about adding `static_assert(::std::size(items) == 2);`? If some literals get glued then assertion will fire. – user7860670 May 19 '23 at 12:00
  • @user7860670 usually such arrays are quite long and counting them would be painful and this assert unmaintainable. – Marek R May 19 '23 at 12:03
  • This is a C++ feature(or some might say a bug depending on context). –  May 19 '23 at 12:11
  • You're probably better off just checking your sources with grep. For example, `grep '"\s*"'` will catch any such concatenations, as long as they aren't split across multiple lines. – Tom Karzes May 19 '23 at 12:12
  • @Anya This is especially dangerous in C++ with all the overloads. – gozag May 19 '23 at 12:13
  • 1
    @TomKarzes false positive: `"foo \" "`. – Marek R May 19 '23 at 12:14
  • 3
    @MarekR Yes, it's not foolproof. but it doesn't have to be. It's more important not to miss any. In practice, your example is probably very rare. Another thing to check is lines that end with `"`, possible will trailing whitespace. Those can be caught with `grep '"\s*$'` if desired. – Tom Karzes May 19 '23 at 12:17
  • 3
    Literal string concatenation is logical [phase 6](https://en.cppreference.com/w/cpp/language/translation_phases#Phase_6) in the compilation process, which happens before tokenisation. Probably nothing you can do about the problem. – Richard Critten May 19 '23 at 12:17
  • One technique I've used in some situations is something like (since C++11) `static_assert (sizeof(a) == expected_size * sizeof(*a))`. The downside of that is that it needs `expected_size` to be known and evaluated at compile time and (a key limitation) that it requires *me* to somehow know what the expected array size is when writing the code. – Peter May 19 '23 at 12:56
  • @Jason But the target is for C only, and specifically for GCC. The warning flag in the answer there doesn't even work for C++. I wouldn't object to the C tag being removed from this question, or maybe even the direction of the targets being reversed, but closing this one as a duplicate of the other is completely incorrect. – cigien May 21 '23 at 11:17
  • 3
    Some candidates (and/or starting points): *[Is there a GCC flag to detect string literal concatenation?](https://stackoverflow.com/questions/28744208/)* (2015. *"I recently fixed a bug ... someone forgot a `,` after `string3`"*) and *[Why allow concatenation of string literals?](https://stackoverflow.com/questions/2504536)* (2010. *"I was recently bitten by a subtle bug ... I forgot the `,` after `two`*) – Peter Mortensen May 21 '23 at 11:29
  • @MarekR Indeed. It's also a great way to spread strings across multiple lines when the string is long. Which does happen. This would be very annoying and it would also be bad since the standard allows this. Changing this would introduce bugs and other kinds of problems not to mention be a nuisance for compiler writers. – Pryftan May 21 '23 at 16:32
  • This is something we say INABIAF - it's not a bug it's a feature. It's very important that this stays the way it is. The bug in your code (or the code you were fixing) might be annoying but this goes for other single character typos. It's part of programming. I make use of this feature a lot. Many other people do. It's great to spread strings across multiple lines (not that that's the only ay to do it). Introducing this would cause bugs and make **perfectly valid code** seem wrong which might invite 'fixing' it which would actually break it. – Pryftan May 21 '23 at 16:37
  • @TomKarzes If you're saying that lines should not end with `"` I don't even know what to say. There are plenty of valid cases where this should be. I use it a great deal and many others do. Otherwise you'd have really long lines for longer strings. To suggest that lines ending with `"` is bad is going too far. Of course if it is how it is for a certain project so be it but it's certainly not objectionable. – Pryftan May 21 '23 at 16:41
  • 2
    @Pryftan This is just a way to catch cases where compile-time string concatenation may occur, nothing more. It doesn't mean that all such matches should be changed. The point is that it would let the user verify that there are no problem cases. – Tom Karzes May 22 '23 at 05:55
  • When is this kind of feature used _on the same line_ though? – Phil May 24 '23 at 01:16
  • @Phil You need this feature used on the same line with preprocessor macros – gozag May 25 '23 at 08:04

4 Answers4

49

Clang has a warning -Wstring-concatenation that is explicitly designed to catch such bugs:

warning: suspicious concatenation of string literals in an array initialization; did you mean to separate the elements with a comma? [-Wstring-concatenation]
char const *a[]  = { "ok", "foo" "bar", "ok"};
                                 ^
                                ,

This won't exactly work for the toy example you showed because you need to have several initializers and only miss commas in a couple of places, i.e.:

// no warning
char const *b[]  = {"foo" "bar"};
// no warning
char const *c[]  = {"ok", "foo" "bar"};
// no warning
char const *d[]  = {"foo" "bar", "ok"};

But when you have a large number of initializers in an array and only make a typo in a couple of places, this seems ideal.

Here's a demo.

GCC doesn't appear to have an equivalent warning, but there is a request for it to be added.

Note that this only works for array initialization. Your example of

const char *x = "foo" " bar";

won't be detected by this warning (or any other that I'm aware of).

Also note that enabling this warning may yield a lot of false positives, but you can use it sparingly when trying to catch bugs.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
cigien
  • 57,834
  • 11
  • 73
  • 112
  • 4
    Also found one for clang-tidy! https://clang.llvm.org/extra/clang-tidy/checks/bugprone/suspicious-missing-comma.html – gozag May 19 '23 at 12:35
  • I wonder why it doesn't work here: https://godbolt.org/z/W4cevbenj – Marek R May 19 '23 at 12:58
  • 6
    GCC will generally warn about string concatenation with `-Wtraditional` (C only), but enabling that option is probably not recommendable. – nielsen May 19 '23 at 14:32
  • 1
    [Is there a GCC flag to detect string literal concatenation?](https://stackoverflow.com/q/28744208/995714) – phuclv May 21 '23 at 10:50
18

Not really. String literal concatenation is an indispensable part of the C/C++ grammar and has many use cases. So some kind of effort is needed and that may defeat the goal of catching a blunder.

However, string concatenation works very strictly on two string literals appearing after each other with only white space in between, so breaking the white space will cause an error. E.g., in this case you could have written:

const *char[] = {("foo") ("bar")};  // Error

which would cause an error while the intended statement would not:

const *char[] = {("foo"), ("bar")};  // OK

So, in short, you cannot have some way to explicitly tell the compiler that two string literals may be concatenated and have it fail in all other cases, so you will have to tell the compiler explicitly when a string literal may not be concatenated.

nielsen
  • 5,641
  • 10
  • 27
  • 12
    _"...tell the compiler explicitly when a string literal may not be concatenated..."_ how about putting a `,` between them? I feel we have gone full circle. – Richard Critten May 19 '23 at 12:19
  • 4
    @RichardCritten Yes, the main point is that I think the solution which the OP is looking for does not exist within the C/C++ compiler services. – nielsen May 19 '23 at 12:24
  • 2
    I think most of the "indispensible" uses involve concatenating string literals with macros, e.g. the macros used in printf format strings. It's rarer that you really need to concatenate just string literals. – Barmar May 20 '23 at 23:04
  • 6
    I concatenate bare string literals all the time without any macros being involved, simply because I like to wrap lines of code. The proposed solution of wrapping each individual string/element in parenthesis is intriguing. Despite quite extensive experience writing and reading C and C++ code, I would not have known immediately whether that syntax was valid. It certainly makes sense that it is, but it also strikes me as fishy. I'd be inclined to flag this in a code review. While interesting and a possible workaround to the issue mentioned, it does, as Richard suggests, bring us full circle. – Cody Gray - on strike May 21 '23 at 07:10
  • 1
    @CodyGray I do it a lot too. It's an important **feature** of C. – Pryftan May 21 '23 at 16:35
  • 3
    @CodyGray: I found the parens were surprising initially, but it didn't take me long to think through why it's valid: a string literal is an object of type `const char*` (or possibly `char*` in C, I forget). Parens can appear in expressions, and evaluate to the wrapped sub-expression. An initializer-list wants a list of `const char*` expressions. I did have to think about it for a couple seconds, but if a code-base used this everywhere, I'd be used to it. So the bigger question is whether this extra syntactical noise for reading the code is worse than the possible problem. – Peter Cordes May 22 '23 at 04:33
  • 2
    The parentheses are an interesting idea. Kind of an insurance, akin to the rule to *always* have curly braces around if/then branches or loop bodies, even if they are only a single line, to prevent things like [Apple's SSL bug](https://www.codecentric.de/wissens-hub/blog/curly-braces) (even if the blog criticizes the curly brace fix as superficial and not sufficient). – Peter - Reinstate Monica May 22 '23 at 08:39
  • @CodyGray I know it's often done to make code more readable, but not sure I would call that an "indispensible" use. – Barmar May 22 '23 at 14:49
2

Either of the macros below make it impossible to accidentally concatenate two strings.

CPP (C preprocessor) macros are awesome in general. It is also legal to have a trailing comma at the end of a list of element.

You can do something like this:

#define STRINGCOMMA(a) a,

const char *x[] = {
    STRINGCOMMA("foo")
    STRINGCOMMA("bar")
};

Or even:

#define QUOTESTRINGCOMMA(a) #a,

const char *x[] = {
    QUOTESTRINGCOMMA(foo)
    QUOTESTRINGCOMMA(bar)};

The comma is added for you, and it would be illegal for you to accidentally do it yourself.

If you are interested, it is also possible to take this concept further to allow creation of parallel lists with the same arguments, but different processing:

X Macro

#define VARLIST \
  DO(foo) \
  DO(bar)

#define DO(a) #a,
  const char *x[] = {
VARLIST
};
#undef DO

This would be useful if you wanted to create a list of enums and a list of strings, from the same list of names.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Juan
  • 3,667
  • 3
  • 28
  • 32
-2

I spent hours finding a bug in a big static array ...

Well, you can do this:

char const * a [] = 
    { "foo"
    , "bar"
    , "baz"
    , "asdf"
    , "ghjk"
    };
KevinZ
  • 3,036
  • 1
  • 18
  • 26
  • there's a similar question here [Can clang format format C/C++ functions to break argument lists before the comma?](https://stackoverflow.com/q/53718346/995714) – phuclv Jun 12 '23 at 12:16