18

I've always used string constants in C as one of the following

char *filename = "foo.txt";
const char *s = "bar";    /* preferably this or the next one */
const char * const s3 = "baz":

But, after reading this, now I'm wondering, should I be declaring my string constants as

const char s4[] = "bux";

?

Please note that linked question suggested as a duplicate is different because this one is specifically asking about constant strings. I know how the types are different and how they are stored. The array version in that question is not const-qualified. This was a simple question as to whether I should use constant array for constant strings vs. the pointer version I had been using. The answers here have answered my question, when two days of searching on SO and Google did not yield an exact answer. Thanks to these answers, I've learned that the compiler can do special things when the array is marked const, and there are indeed (at least one) case where I will now be using the array version.

RastaJedi
  • 641
  • 1
  • 6
  • 18
  • 1
    Interesting.. The author is correct - "arrays" *ARE* different from "pointers". And I would have *THOUGHT* that the pointer syntax (e.g. `const char *s = "bar";`) was generally "preferred". I'm surprised with his conclusions that "array syntax" is actually more efficient - with different compilers, and on different platforms. – paulsm4 Mar 04 '19 at 01:52
  • Indeed. I always used `const char *s = "bar";` where I could. I never really thought to use a const-char array. I've been trying to find a solid answer for 2 days so I figured I had to ask here :P. I realize that I can use `sizeof` with the array version, but that isn't really too important at least in my current case. I'm wondering, with this author's considerations, what the *general* approach should be. – RastaJedi Mar 04 '19 at 01:54
  • 1
    Seems like a micro-optimization to me. – dbush Mar 04 '19 at 01:58
  • I had another pro-array version link I was reading, that was talking about the overhead of using the pointer version. I can't seem to find it right now but I'll post back if I do. And @dbush, I'm sure, but I don't know if I have OCD or what, but I get real picky about real mundane stuff haha. – RastaJedi Mar 04 '19 at 02:01
  • 1
    In Apple LLVM 10,0.0 with clang-1000.11.45.5, the difference vanishes if you insert `const` after `*` in `const char *ptr = "Lorum ipsum";`. The fact the compiler had to load `ptr` arose entirely from the fact it could be changed in some other module not visible to the compiler. Making the pointer `const` eliminates that, and the compiler can prepare the address of the string directly, without loading the pointer. – Eric Postpischil Mar 04 '19 at 02:10
  • Also, I was reading something about the contents of `s3` being mutable, despite the `const`. Is there any merit to that? (In [this](https://stackoverflow.com/a/11974752/1701799) answer). – RastaJedi Mar 04 '19 at 02:17
  • Possible duplicate of [What is the difference between char s\[\] and char \*s?](https://stackoverflow.com/questions/1704407/what-is-the-difference-between-char-s-and-char-s) – Fabio says Reinstate Monica Mar 04 '19 at 10:41
  • It's a bit unclear what point the linked article is trying to make when the function is declared without prototype (`void bogus();`). This should always be treated as an error on year 2019 C code (or 2017 when article was written). – user694733 Mar 04 '19 at 11:39
  • This is pretty much a non-issue since we wouldn't usually write functions `void do_stuff (void)` with no parameters, that work on global variables. When properly passing the variable through parameter, there is no difference between the two cases. So the link pretty much boils down to pre-mature optimization of badly written code. A more interesting question would be to look at the code on the caller side, if there's a difference in performance when passing on a _local_ pointer versus a _local_ array. – Lundin Mar 04 '19 at 12:37
  • @FabioTurati I found that question through Google. But it only explains the technical differences, of which I already knew. So even after finding that I had to post this question. Also the array version in that linked question is not `const` qualified. It's merely talking about an editable string where I'm specifically asking about constants. – RastaJedi Mar 06 '19 at 00:44
  • @RastaJedi I had missed that this question is specifically about const strings and the other one isn't. Sorry! I've retracted my flag and upvoted. – Fabio says Reinstate Monica Mar 06 '19 at 09:18
  • @FabioTurati no worries. Thanks for the suggestion though. – RastaJedi Mar 07 '19 at 17:16

3 Answers3

16

Pointer and arrays are different. Defining string constants as pointers or arrays fits different purposes.

When you define a global string constant that is not subject to change, I would recommend you make it a const array:

const char product_name[] = "The program version 3";

Defining it as const char *product_name = "The program version 3"; actually defines 2 objects: the string constant itself, which will reside in a constant segment, and the pointer which can be changed to point to another string or set to NULL.

Conversely, defining a string constant as a local variable would be better done as a local pointer variable of type const char *, initialized with the address of a string constant:

int main() {
    const char *s1 = "world";
    printf("Hello %s\n", s1);
    return 0;
}

If you define this one as an array, depending on the compiler and usage inside the function, the code will make space for the array on the stack and initialize it by copying the string constant into it, a more costly operation for long strings.

Note also that const char const *s3 = "baz"; is a redundant form of const char *s3 = "baz";. It is different from const char * const s3 = "baz"; which defines a constant pointer to a constant array of characters.

Finally, string constants are immutable and as such should have type const char []. The C Standard purposely allows programmers to store their addresses into non const pointers as in char *s2 = "hello"; to avoid producing warnings for legacy code. In new code, it is highly advisable to always use const char * pointers to manipulate string constants. This may force you to declare function arguments as const char * when the function does not change the string contents. This process is known as constification and avoid subtile bugs.

Note that some functions violate this const propagation: strchr() does not modify the string received, declared as const char *, but returns a char *. It is therefore possible to store a pointer to a string constant into a plain char * pointer this way:

char *p = strchr("Hello World\n", 'H');

This problem is solved in C++ via overloading. C programmers must deal with this as a shortcoming. An even more annoying situation is that of strtol() where the address of a char * is passed and a cast is required to preserve proper constness.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Sorry, `const char *const s3` is what I meant to type. Fixed it now :). So for global, using the array version won't result in a copy? – RastaJedi Mar 04 '19 at 02:32
  • To the extent that you would get a "redundant" pointer variable in the `s1` case, aren't you prone to a "redundant" _array copy_ in the `product_name` case (for them's the semantics) which is far worse? I've never heard this advice before tbh. – Lightness Races in Orbit Mar 04 '19 at 02:35
  • @LightnessRacesinOrbit: in the case of a global variable, the is no copy at runtime but indeed the string constant may be duplicated in the binary file and in memory: `const char *s1 = "toto", *s2 = "toto";` may initialize both `s1` and `s2` to the same value whereas `const char s1[] = "toto", s2[] = "toto";` defines 2 separate objects each with the same duplicated contents. – chqrlie Mar 04 '19 at 02:38
  • Right, and doesn't that suggest that `const char*` is a much superior choice? – Lightness Races in Orbit Mar 04 '19 at 02:39
  • @LightnessRacesinOrbit: not necessarily: `const char *s2 = "asdfghjk";` uses more memory: space for the string constant in the text segment and space for the pointer in the data segment, possibly with a relocation at load time and a very small code overhead at runtime as demonstrated in the article in reference. – chqrlie Mar 04 '19 at 02:41
  • And you just said that doing the same with an array _duplicates the array_, which is a lot more than a pointer. Any sensible optimising compiler should get rid of this entire problem anyway, no? – Lightness Races in Orbit Mar 04 '19 at 02:46
  • @LightnessRacesinOrbit: as explained in the answer, the effects are different depending on the scope of the definition. Arrays defined at global scope do not incur the runtime duplication overhead. Identical array contents may require extra memory, but no extra CPU time. – chqrlie Mar 04 '19 at 02:51
  • If the global version doesn't result in a copy, this answer makes sense to me. I suppose it would just come down to how many strings you have, if some of them can share space (pointer version), and if there's not that many, save some space by not needing space for the pointer. I think I shall use your suggestions, @chqrlie, and I thank both of you for your input :). – RastaJedi Mar 04 '19 at 02:53
  • 1
    I've marked this as the accepted answer because this answers my 'how should I declare' question more directly, although the information in @EricPostpischil's answer directly comments on the information provided in my source link. Unfortunately I cannot accept two answers, but I urge all of you who are reading this answer to also view his answer. Thank you all! – RastaJedi Mar 04 '19 at 03:02
8

The linked article explores a small artificial situation, and the difference demonstrated vanishes if you insert const after * in const char *ptr = "Lorum ipsum"; (tested in Apple LLVM 10.0.0 with clang-1000.11.45.5).

The fact the compiler had to load ptr arose entirely from the fact it could be changed in some other module not visible to the compiler. Making the pointer const eliminates that, and the compiler can prepare the address of the string directly, without loading the pointer.

If you are going to declare a pointer to a string and never change the pointer, then declare it as static const char * const ptr = "string";, and the compiler can happily provide the address of the string whenever the value of ptr is used. It does not need to actually load the contents of ptr from memory, since it can never change and will be known to point to wherever the compiler chooses to store the string. This is then the same as static const char array[] = "string";—whenever the address of the array is needed, the compiler can provide it from its knowledge of where it chose to store the array.

Furthermore, with the static specifier, ptr cannot be known outside the translation unit (the file being compiled), so the compiler can remove it during optimization (as long as you have not taken its address, perhaps when passing it to another routine outside the translation unit). The result should be no differences between the pointer method and the array method.

Rule of thumb: Tell the compiler as much as you know about stuff: If it will never change, mark it const. If it is local to the current module, mark it static. The more information the compiler has, the more it can optimize.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • `static` would only help for global scope or inside of `main()`, but not any other function scope? Or should I still use `static` in `main()`? If, e.g. I was using a string constant for a filename, and I was to call my own function to open up that file, it shouldn't matter if it's static even if that function was in another file (unless I was passing the address of it)... or am I mixing this up. I've never really used `static` much, to be honest. – RastaJedi Mar 04 '19 at 02:25
  • @RastaJedi: At file scope (outside of any function), `static` essentially just changes whether the thing may be known outside the current translation unit. Inside a block (any set of statements in braces in a function, `main` or other), `static` both says “this name is not known outside this block” and “this thing sticks around for all of program execution, not just block execution.” With a `const` object, though, the two lifetimes (static and automatic) often optimize to the same thing. (Taking addresses of things can change that, for semantic reasons, but simple uses are unaffected.) – Eric Postpischil Mar 04 '19 at 02:34
  • So if I have a `static` string constant, and pass it to a routine in another TLU, this would still work, but the possibility of optimization goes away, is what you are getting at? Within a function, try to mark it `static` if I don't need to change it anywhere else or pass it to any other routine, for a possibility of optimization? What about the cases of non-constant values. Just use `static` if I need it's value to stick around when it goes out of scope or to limit to that TLU within file scope? I'm assuming there's no possibility of it being optimized in this way if it is modifiable. – RastaJedi Mar 04 '19 at 02:46
  • Given `static const char * const ptr = "string";` at function level, passing `ptr` to a routine outside the current translation unit should have the same effect as passing `"string"` or `array`, where `array` is `const char array[] = "string";`. In all those cases, the compiler only needs to pass the address of the actual string, which itself may not be changed. So it should produce the same code for all of them. If you passed `&ptr` outside the module, then the compiler has to create an actual `ptr` so it can take its address. But that is a different use case. – Eric Postpischil Mar 04 '19 at 02:52
  • Outside a function, mark anything `static` if it does not need to be known outside the current translation unit. Inside a function (or block), identifiers are already not known outside the function (or block), as if they were `static`. There, using `static` actually adds a property that they persist for program execution instead of function/block execution, so do not use `static` inside a function/block unless you need that property. – Eric Postpischil Mar 04 '19 at 02:53
  • In the case of a string that is modifiable, the original article does not apply. It discussed only string literals. You will need to put it into an array so that it can be modified. – Eric Postpischil Mar 04 '19 at 02:55
  • Gotchya. I should have explained in my original question these string constants were specific to `main()`. I believe I've always marked my file scope variables with `static` unless needed to otherwise. It's nice to see this can incur some optimization when it is a constant. Thank you for your input! I wish I could mark two answers as accepted. – RastaJedi Mar 04 '19 at 02:59
  • 1
    But gcc and icc doesn't give identical results in pointer vs array even when you make the pointer `* const`. But yes overall this case is "pre-mature optimization". – Lundin Mar 04 '19 at 12:33
2

From the performance perspective, this is a fairly small optimization which makes sense for low-level code that needs to run with the lowest possible latency.

However, I would argue that const char s3[] = "bux"; is better from the semantic perspective, because the type of the right hand side is closer to type of the left hand side. For that reason, I think it makes sense to declare string constants with the array syntax.

merlin2011
  • 71,677
  • 44
  • 195
  • 329
  • 1
    Aren't string literals in C not technically `const`-qualified type? Despite being immutable. Or you just meant the array aspect of it perhaps. Also doesn't the array version have to copy an entire string? – RastaJedi Mar 04 '19 at 02:09
  • You are correct. However, it's UB to modify the elements of a string literal, and I was referring to the array part. :) – merlin2011 Mar 04 '19 at 02:17