5

I found code in libcurl that looks like:

const char *
curl_easy_strerror(CURLcode error)
{
  switch(error) {
  case CURLE_OK:
    return "No error";

  case CURLE_UNSUPPORTED_PROTOCOL:
    return "Unsupported protocol";
.....
}

As I know, if you want to return a pointer, you need to make sure the memory which the pointer point will not be changed or released. Why is it that this libcurl code works?

NewBee
  • 1,331
  • 12
  • 13
  • 2
    If you want to return a pointer, you need to make sure it will point to something valid after you return it. `malloc` is one way to do that, but hardly the only way. – Raymond Chen May 17 '19 at 03:26
  • 1
    Pointer to a static, global or as in this case a literal constant are also valid. – Clifford May 17 '19 at 05:36
  • The answer you selected is incorrect. @P_J_ provided you a correct answer in comments. – alinsoar May 17 '19 at 07:51
  • Thanks all, this question is duplicated. Add the answer is almost clear after so many useful discusstions. Sting literals is storaged at a special "readonly&static" zone at most time, of course this depends on platform. And the static part is certain(readonly is not certain). – NewBee May 17 '19 at 12:53

2 Answers2

10

Those string literals are placed in a static read-only section of the executable at compile time. They are separate from the heap or the stack. The function is simply returning a pointer that points to those strings.

Reference

The implementation of this is platform and compiler specific, but the C11 standard has a few relevant requirements on this in section 6.4.5.

In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence.

So we know it must be stored in a static location at compile time.

If the program attempts to modify such an array, the behavior is undefined.

This tells us the data must be read-only.

Edit

Some people are complaining that this is incorrect, citing specific platforms or architectures. As noted above this is platform and compiler specific.

Some platforms, may not support read-only data, but the compiler will almost certainly try to prevent you from modifying it. Since the behavior is undefined, the intent is that you never do this, so for all intents and purposes the data is read-only.

In the context of the question, this answer is correct.

Community
  • 1
  • 1
cj.vaughter
  • 248
  • 2
  • 6
  • thanks, it seems c is just like java, string literals is special. – NewBee May 17 '19 at 03:37
  • Yeah most languages seem to have at least a few special behaviors for strings. – cj.vaughter May 17 '19 at 03:52
  • 1
    this is a very bad imprecise answer just repearing the myths about the string literal. string literals do not have to be placed in the RO memory and they do not have to be valid outside the scope they are used. any Harvard architecture computer will copy it first to the data memory. the popular example is the AVR family of the uCs. very popular family used for example in the Arduino boards. – 0___________ May 17 '19 at 07:08
  • 1
    the last sentence is also wrong. it only says that we cannot modify it. if we do modify it,the program behaviour is not defined. Nothing more. your quite naive interpretation of the standard document is wrong – 0___________ May 17 '19 at 07:12
  • your answer does not say explicitly that the ISO requires the string to be kept in a static zone of memory. – alinsoar May 17 '19 at 07:37
  • 2
    This answer is correct where it matters the most: static storage duration is guaranteed and user must treat literals as read-only. All section stuff and stack/heap stuff is implementation defined and not that relevant to answer actually. I would maybe use little different wording, but overall the answer is fine. – user694733 May 17 '19 at 10:48
  • @P__J__: The statement “they [string literals] do not have to be valid outside the scope they are used” is false. Per C 2018 6.4.5 6, the character sequence of a string literal “is then used to initialize an array of static storage duration”. It is this array that the string literal in code represents and whose address is returned by the function the question. As a static array, it is valid for the duration of program execution. – Eric Postpischil May 17 '19 at 12:06
  • @EricPostpischil it's the return of the hidden address operator that asks the string to be kept in static zone or some parameter of configuration of the linker/compiler, etc ? As I said in a comment, it can happen to have strings like char*a="string" and not to be captured by the `strings` program in linux. – alinsoar May 17 '19 at 12:08
  • @alinsoar: It is just the nature of string literals. A string literal in source code is, in effect, a static array of `char` (for plain character string literals; `wchar_t`, `char16_t`, or `char32_t` for wide string literals). – Eric Postpischil May 17 '19 at 12:11
  • @cj.vaughter your last statement is false, there are operating systems where the memory segments cannot be read-only as the system does not have such feature. – alinsoar May 17 '19 at 12:11
  • As I said in my answer, this is platform and compiler specific. Given the context of the question, it seemed appropriate to assume a Princeton architecture, and a Linux based operating system. – cj.vaughter May 17 '19 at 12:51
4

According to the C standard (6.4.5 String literals, paragraph 6), string literals have static storage duration:

a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration […]

This means that their memory, wherever it may physically be, is guaranteed to outlive the function return, and pointers to this memory remain valid.

Therefore, you’re returning a pointer to a memory location that’s guaranteed to be valid, and that contains the value given by the string literal.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • the problem here is that we are talking in this thread that one cannot find an explicit statement that asserts what you say. Can you paste the explicit statement from ISO9899 that states what you say ? – alinsoar May 17 '19 at 10:03
  • @alinsoar Added the quote although I believe that it adds zero value to this answer: Finding this quote based on the reference I gave is trivial. Furthermore, the highest-voted answer *already* contained that exact quote (although I only saw that now, and that answer also contains mistakes). – Konrad Rudolph May 17 '19 at 10:19
  • I saw that statement in ISO9899, I am not sure that `used to initialize an array of static storage duration` is correctly interpreted as "to keep the string in static zone". – alinsoar May 17 '19 at 11:03
  • I remember that I saw code like ``char*a="abc"`` in which ``a`` was initialized on the stack direcly using `mov` instructions whose operand contained the constants for given ASCII, not by copying from a static zone. – alinsoar May 17 '19 at 11:06
  • 1
    @alinsoar What other way is there to interpret it? It literally says it there. If you saw a compiler doing something different then the compiler was operating under the [as-if rule](https://en.cppreference.com/w/cpp/language/as_if). But the stack allocation only happens if the compiler can prove that no reference to the memory exists after the function exits. – Konrad Rudolph May 17 '19 at 11:10
  • I suspect the address operator that is here hiddenly used forces it to enter the static zone. Not sure, however, but it is sure that not all the time the ",,,," are captured by programs like `strings` in linux, so they are not kept in all cases in the static zone. – alinsoar May 17 '19 at 11:28
  • 3
    @alinsoar: Examining assembly code only shows you how a C implementation implemented the code. Due to the C standard’s “as if” rule, compilers are allowed to generate any code that want that as the same results. If that program did not rely on the string existing after a function returned, then the compiler can generate code that has the string only on the stack or as an immediates in instructions, because the result **in that program** is the same. So the compiler optimized the code for that program. That does not change the semantics specified by the C standard. – Eric Postpischil May 17 '19 at 12:14