2

I'm learning C and today I stuck with the "strings" in C. Basically I understand that there is no such thing like string in C. In C strings are an array characters terminated with \0 at the end. So far so good.

char *name = "David";
char name[] = "David";
char name[5] = "David";

This is where confusing starts. Three different ways to declare "strings". Can you provide me with a simple examples in which situations which one to use. I've read a lot tutorials on the web but still can't get the idea.

I read this How to declare strings in C question on stackoverflow but still can't get the difference..

Community
  • 1
  • 1
moemoe
  • 63
  • 5

3 Answers3

3
  • First one char *name = "David"; is string literal and is resides in read only section of memory. You can't do any modification to it. Better to write

    const char *name = "David";

  • Second one char name[] = "David"; is a string of 6 chars including '\0'. Modification can be done.

  • char name[5] = "David"; invoke undefined behavior. "David" is a string of 6 chars (including terminating '\0'). You need an array of 6 chars to store it.

    char name[6] = "David";

Further reading: C-FAQ 6. Arrays and Pointers.

Roman Nikitchenko
  • 12,800
  • 7
  • 74
  • 110
haccks
  • 104,019
  • 25
  • 176
  • 264
  • 3
    The second one is not an array of 5 `char`s. –  Dec 15 '13 at 19:36
  • 1
    @haccks what do you mean by "invoke undefined behavior" for the third example? – moemoe Dec 15 '13 at 19:40
  • 1
    @moemoe he means [this](http://blog.regehr.org/archives/213). –  Dec 15 '13 at 19:41
  • If I understand your answer for the first example it is useful in situations where the string will not change. For example declaring hostname for database - const char *hostname = "localhost" Am I rigth? – moemoe Dec 15 '13 at 19:43
  • @moemoe Yes, that seems reasonable as a first approximation. –  Dec 15 '13 at 19:44
  • @moemoe; "invoke undefined behavior" means in such situation when you are accessing some unknown memory location then the program may behave erroneous and you may get anything either expected or unexpected result. – haccks Dec 15 '13 at 19:54
  • But AFAICT that doesn't happen here - unless the 3rd `name` is actually used as a string. – glglgl Dec 15 '13 at 20:00
  • @glglgl; Yes. But one should have to beware of that. – haccks Dec 15 '13 at 20:02
0

This link provides a pretty good explanation.

char[] refers to an array, char* refers to a pointer, and they are not the same thing.

char a[] = "hello"; // array
char *p = "world"; // pointer

According to the standard, Annex J.2/1, it is undefined behavior when:

—The program attempts to modify a string literal (6.4.5).

6.4.5/5 says:

In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals.

Therefore you actually need an array of six elements to account for the NUL character.

0

In the first example, you declare a pointer to a variable:

// A variable pointer to a variable string (i.e. an array of 6 bytes). 
char *pName = "David"; 

At this time, you can modify the 6 bytes occupied by 'D', 'a', 'v', 'i', 'd', '\0':

pName[0] = 'c';
*pName = 'c';
*(pName+0) = 'c';
strcpy(pName, "Eric"); // Works well

But ONLY those 6 bytes:

// BUG: Will overwrite 2 random bytes located after \0 in RAM.
strcpy(pName, "Fredrik"); 

The pointer can be altered runtime to point to another variable string e.g.

pName = "Charlie Chaplin";

Which then can be modified

pName[0] = 'c';
*pName = 'c';
*(pName+0) = 'c';
// OK now, since pName now points to the CC array
// which is 16 bytes located somewhere else:
strcpy(pName, "Fredrik"); 

As stated by others, you would normally use const char * in the pointer cases, which also is the preferred way to use a string. The reason is that the compiler will help you from the most common (and hard-to-find) bugs of memorytrashing:

// A variable pointer to a constant string (i.e. an array of 6 constant bytes). 
const char *pName = "David"; 
// Pointer can be altered runtime to point to another string e.g.
pName = "Charlie";
// But, the compiler will warn you if you try to change the string
// using any of the normal ways:
pName[0] = 'c';       // BUG
*pName = 'c';         // BUG
*(pName+0) = 'c';     // BUG
strcpy(pName, "Eric");// BUG

The other ways, using an array, gives less flexibility:

char aName[] = "David"; // aName is now an array in RAM.
// You can still modify the array using the normal ways:
aName[0] = 'd';
*aName = 'd';
*(aName+0) = 'd';
strcpy(aName, "Eric"); // OK

// But not change to a larger, or a different buffer
aName = "Charlie"; // BUG: This is not possible.

Similarly, a constant array helps you even more:

const char aName[] = "David"; // aName is now a constant array.
// The compiler will prevent modification of it:
aName[0] = 'd';       // BUG
*aName = 'd';         // BUG
*(aName+0) = 'd';     // BUG
strcpy(aName, "Eric");// BUG 
// And you cannot of course change it this way either:
aName = "Charlie"; // BUG: This is not possible.

The major difference between using the pointer vs array declaration is the returned value of sizeof(): sizeof(pName) is the size of a pointer, i.e. typically 4. sizeof(aName) returns the size of the array, i.e. the length of the string+1. It matters most if the variable is declared inside a function, especially if the string is long: It occupies more of the precious stack. Thus, the array declaration is normally avoided. It also matters when passing the variable to a macros which use sizeof(). Such macros must be supplied with the intended type.

It also matters if you want to e.g. swap the strings. Strings declared as pointers are straight-forward and requires the CPU to access less bytes, by simply moving the 4 bytes of the pointers around:

const char *pCharlie = "Charlie";
const char *pDavid = "David";
const char *pTmp;

pTmp = pCharlie;
pCharlie = pDavid;
pDavid = pTmp;

pCharlie is now "David", and pDavid is now "Charlie".

Using arrays, you must provide a temporary storage large enough for the largest string, and use strcpy(), which takes more CPU, copying byte for byte in the strings.

The last method is rarely used, since the compiler automatically calculates that David needs 6 bytes. No need to tell it what's obvious.

char aName[6] = "David";

But, it is sometimes used in cases where the array MUST be a fixed length, independent of its contents, e.g. in binary protocols or files. In that case, it can be of benefit to manually add the limit, in order to get help from the compiler, should anyone by accident add or remove a character from the string in the future.