2

So I want to know exactly how many ways are there to declare string. I know similar questions have been asked for several times, but I think my focus is different. As a beginner in C, I want to know which method of declaration is correct and preferable, so that I can stick to it.

We all know we can declare string in the two following ways.

char str[] = "blablabla";
char *str = "blablabla";

After reading some Q&A in stack-overflow, I was told string literal is placed in the read-only part of the memory. So in order to create modifiable string, you need to create a character array of the length of the string + 1 in the stack and copy each characters to array. So this is what the 1st line of code doing.

However, what the 2nd line of code does is to create a character pointer and assign the pointer the address of the 1st character located in the read-only part of the memory. So this kind of declaration does not involved copying character by character.

Please let me know if I am wrong.

So far it seems quite understandable, but what really confuses me is the modifiers. For instance, some suggests

const char *str = "blablabla";

instead of

char *str = "blablabla";

because if we do something like

*(str + 1) = 'q';

It will cause undefined behavior which is horrible.

But some go even further and suggest something like

static const char *str = "blablabla";

and say this will place the string into the static memory which will never gets modified to avoid the undefined behavior.

So which is actually the #right# way to declare a string?

Besides, I am also interested in knowing the scope when declaring string.

For example,

(You can ignore the examples, both of them are buggy as pointed out by the others)

#include <stdio.h>

char **strPtr();

int main()
{
  printf("%s", *strPtr());
}

char **strPtr()
{
  char str[] = "blablabla";
  char *charPtr = str;
  char **strPtr = &charPtr;
  return strPtr;
}

will print some garbage value.

But

#include <stdio.h>

char **strPtr();

int main()
{
  printf("%s", *strPtr());
}

char **strPtr()
{
  char *str = "blablabla";

  /*As point out by other I am returning the address of a local variable*/
  /*This causes undefined behavior*/
  char **strPtr = &str;
  return strPtr;
}

will work perfectly fine. (NO it doesn't, it is undefined behavior.)

I think I should leave it as another question. This question is getting too long.

Alex Vong
  • 483
  • 4
  • 15
  • As for scoping, strings doesn't have scope by themselves, variables pointing containing strings or pointing to strings do have scope, and their scope is as for all other variables, which means that if you return a pointer to a local variable then you have undefined behavior. So none of your scoping examples actually *works", even the second version that seems to work is still undefined. – Some programmer dude Jan 08 '15 at 14:19
  • @JoachimPileborg If `s1` is a local variable then `char s1[] = "foo";` will involve copying behind the scenes, and have more overhead than `const char *s2 = "foo";`. – interjay Jan 08 '15 at 14:20
  • Declaring a string variable as an array and initializing it ***may*** involve copying. For global variables the compiler will handle it all at compilation time. For local variables the compiler ***may*** copy, it ***might*** also optimize it and handle it some other way (similar to how it handles global string arrays). However, any overhead is likely small unless you have very large strings, and call the function containing the string ***very*** often (like hundreds of times per second). – Some programmer dude Jan 08 '15 at 14:22
  • 2
    Your last example isn't "perfectly fine", because the intermeiate pointer `charPtr` to which you return the address will go out of scope before you use it. Returning a pointer to a string literal without the double indirection will work fine, though. – M Oehm Jan 08 '15 at 14:25
  • Both code snippet invokes UB. Functions are returning pointer to local variables. – haccks Jan 08 '15 at 14:29
  • @Gopi; I am not sure that `char *str = "blablabla";` is equivalent to `static char *str = "blablabla";`. If it is not, then second snippet is also wrong. – haccks Jan 08 '15 at 14:35
  • why does returning a pointer to a local variable will cause undefined behavior? Isn't "blablabla" never changed through out the execution of the program. – Alex Vong Jan 08 '15 at 14:40
  • I know what is wrong in my 2nd example I will edit it. – Alex Vong Jan 08 '15 at 14:42
  • You haven't really fixed the example, you have only obscured the error. The _data_, i.e. the 10-byte chunk that hold the null-terminated `char` array ´"blablabla"` resides in static memory; it is always legal to access it. The local pointer `str` that holds the address of this chunk will no longer be valid after `strPtr` has finished executing. – M Oehm Jan 08 '15 at 14:51
  • Do you mean that the char pointer str is local so I cannot dereference it after the function end? – Alex Vong Jan 08 '15 at 15:00
  • Do you know how function calls and returns affect the stack? – Pieter Witvoet Jan 08 '15 at 15:05
  • I know a little about calling function will create a stack frame and returning will wipe it out. I think I get it. – Alex Vong Jan 08 '15 at 15:10

1 Answers1

3

A lot of your confusion comes from a common mis-understanding about C and strings: one which is explicitly stated in the title of your question. The C language does not have a native string type, so in fact there are exactly zero ways to declare a string in C.

Spend some time reading Does C Have a String type? which does a good job of explaining that.

This is evident from the fact that you can't (sensibly) do the following:

char *a, *b;
// code to point a and b at some "strings"
if (a == b)
{
   // do something if strings compare equal
}

The if statement will compare the values of the pointers, not the contents of the memory they address. So, if a and b pointed to two different areas of memory, each containing identical data, the comparison a == b would fail. The only time the comparison would evaluate as "true" (i.e. something other than zero), would be if a and b held the same address (i.e. pointed to the same location in memory).

What C has is a convention, and some syntactic sugar to make life easier.

The convention is that a "string" is represented as a sequence of char terminated with the value zero (usually referred to as NUL and represented by the special character escape sequence '\0'). This convention comes from the API of the original standard library (back in the 70's) which provides a set of string primitives such as strcpy(). These primitives were so fundamental to doing anything truly useful in the language that life was made easier for programmers by adding syntactic sugar (this is all before the language escaped from the lab).

The syntactic sugar is the existence of "string literals": no more, and no less. In source code, any sequence of ASCII characters, enclosed in double quotes, is interpreted as a string literal and the compiler produces a copy (in "read-only" memory) of the characters plus a terminating NUL byte to conform to the convention. Modern compilers detect duplicated literals and only produce a single copy - but it's not a requirement of the standard last time I looked. Thus this:

assert("abc" == "abc");

may or may not raise an assertion - which reinforces the statement that C does not have a native string type. (For that matter, neither does C++ - it has a String class!)

With that out of the way, how do you use string literals to initialize a variable?

The first (and most common) form you will see is this

char *p = "ABC";

Here, the compiler sets aside 4 bytes (assuming sizeof(char) ==1) of memory in a "read only" section of the program and initializes it with [0x41, 0x42, 0x43, 0x00]. It then initializes p with the address of that array. You should note that there is some const casting going on here as the underlying type of a string literal is const char * const (a constant pointer to a constant character). Which is why you would normally be advised to write this as:

const char *p = "ABC";

Which is a "pointer to a constant char" - another way of saying "pointer to read only memory".

The next two forms use string literals to initialize arrays

char p1[] = "ABC";
char p2[3] = "ABC";

Note that there is a critical difference between the two. the first line creates a 4 byte array. The second creates a 3 bytes array.

In the first case, as before, the compiler creates a 4 byte constant array containing [0x41, 0x42, 0x43, 0x00]. Note that it adds the trailing NUL to form a "C String". It then reserves four bytes of RAM (on the stack for a local variable, or in "static" memory for variables at file scope) and inserts code to initialize it at run time by copying the "read only" array into the allocated RAM. You are now free to modify elements of p1 at will.

In the second case, the compiler creates a 3 byte constant array containing [0x41, 0x42, 0x43]. Note that there is no trailing NUL. It then reserves 3 bytes of RAM (on the stack for a local variable, or in "static" memory for variables at file scope) and inserts code to initialize it at run time by copying the "read only" array into the allocated RAM. You are again now free to modify elements of p2 at will.

The difference in sizes of the two arrays p1 and p2 is critical. The following code (if you ran it) would demonstrate it.

char p1[] = "ABC";
char p2[3] = "ABC";

printf ("p1 = %s\n", p1); // Will print out "p1 = ABC"
printf ("p2 = %s\n", p2); // Will print out "p2 = ABC!@#$%^&*"

The output of the second printf is unpredictable, and could theoretically result in your code crashing. It tends to seem to work, simply because so much of RAM is filled with zeroes that eventually printf finds a terminating NUL.

Hope this helps.

Community
  • 1
  • 1
kdopen
  • 8,032
  • 7
  • 44
  • 52
  • Thanks for your explanation. There is one thing I don't understand. What does `assert("abc" == "abc");` does? `asset` will check the expression inside at runtime, but what does the expression `"abc" == "abc"` does? Is it checking if the two "abc" stored in the same block of memory by the checking the address they are storing? – Alex Vong Jan 10 '15 at 09:40
  • Precisely, it is checking if both "abc" strings have the same address and asserts if they don't. Depending on your compiler and optimization settings, it may - or may not - assert. – kdopen Jan 11 '15 at 15:32