19

How should I understand char * ch="123"?

'1' is a char, so I can use:

char x = '1';
char *pt = &x;

But how do I understand char *pt="123"? Why can the char *pt point to string?

Is pt's value the first address value for "123"? If so, how do I get the length for the string pointed to by pt?

Prashant Kumar
  • 20,069
  • 14
  • 47
  • 63
jiafu
  • 6,338
  • 12
  • 49
  • 73
  • 2
    Yes, this is illogical and horrible. Not your question - the throwback C-style strings with null-terminators. +1 for identifying the illogic of a pointer to a char actually pointing at more than one char. – Martin James Nov 27 '13 at 09:48
  • 1
    @MartinJames: this is not C-style string specific, any pointer has this feature... – Karoly Horvath Nov 27 '13 at 11:14
  • @KarolyHorvath Not every pointer has the feature that it can address characters in a string literal. There are several aspects to this question (at least if you want to answer it comprehensively), and some of it is just about pointers, but some is about C's weird treatment of "strings" – jalf Nov 27 '13 at 13:30
  • 1
    @KarolyHorvath: Only pointers to variable size data structures have this issue. Even then, that's only if you consider "this issue" being unable to know the size of a data structure from merely it's type and a pointer. You could, for example, define a "string" as the length of the string and then that many bytes of character data, which would sidestep all the awful issues that come from null termination. – Phoshi Nov 27 '13 at 13:49

6 Answers6

34

That is actually a really good question, and it is the consequence of several oddities in the C language:

1: A pointer to a char (char*) can of course also point to a specific char in an array of chars. That is what pointer arithmetic relies on:

// create an array of three chars
char arr[3] = { 'a', 'b', 'c'};
// point to the first char in the array
char* ptr = &arr[0]
// point to the third char in the array
char* ptr = &arr[2]

2: A string literal ("foo") is actually not a string as such, but simply an array of chars, followed by a null byte. (So "foo" is actually equivalent to the array {'f', 'o', 'o', '\0'})

3: In C, arrays "decay" into pointers to the first element. (This is why many people incorrectly says that "there is no difference between arrays and pointers in C"). That is, when you try to assign an array to a pointer object, it sets the pointer to point to the first element of the array. So given the array arr declared above, you can do char* ptr = arr, and it means the same as char* ptr = &arr[0].

4: In every other case, syntax like this would make the pointer point to an rvalue (loosely speaking, a temporary object, which you can't take the address of), which is generally illegal. (You can't do int* ptr = &42). But when you define a string literal (such as "foo"), it does not create an rvalue. Instead, it creates the char array with static storage. You're creating a static object, which is created when the program is loaded, and of course a pointer can safely point to that.

5: String literals are actually required to be marked as const (because they are static and read-only), but because early versions of C did not have the const keyword, you are allowed to omit the const specifier (at least prior to C++11), to avoid breaking old code (but you still have to treat the variable as read-only).

So char* ch = "123" really means:

  1. write the char array {'1', '2', '3', '\0'} into the static section of the executable (so that when the program is loaded into memory, this variable is created in a read-only section of memory)
  2. when this line of code is executed, create a pointer which points to the first element of this array

As a bonus fun fact, this differs from char ch[] = "123";, which instead means

  1. write the char array {'1', '2', '3', '\0'} into the static section of the executable (so that when the program is loaded into memory, this variable is created in a read-only section of memory)
  2. when this line of code is executed, create an array on the stack which contains a copy of this statically allocated array.
jalf
  • 243,077
  • 51
  • 345
  • 550
  • why int *pt={1,2,3} can't work but int pt[]={1,2,3} can work? can you help this? – jiafu Nov 27 '13 at 10:20
  • 3
    @jiafu `{1,2,3}` can be used to initialize an array, but it *is* not an array. So in the second case, you use it to create an array named `pt`, containing those three numbers. But in the first case, you try to create a pointer to something that is not an object. That is where string literals are treated specially (see point #4 above) because it actually *creates* the array, which the syntax `{1,2,3}` does not do – jalf Nov 27 '13 at 10:23
  • Since C++11, you are no longer allowed to omit the `const`. – Mike Seymour Nov 27 '13 at 12:19
  • Awesome answer: but maybe mention or stress that a `const` will make the code much much better? – Yakk - Adam Nevraumont Nov 27 '13 at 12:20
  • @MikeSeymour good point, I added a note mentioning this. Thanks. – jalf Nov 27 '13 at 13:28
  • @Yakk I feel like my answer is already fairly cluttered, so I'd rather not add more information unless it relates directly to the question. (Note that as Mike Seymour mentions above, in C++11 you are *required* to use `const` anyway, so it no longer matters that it subjectively improves the code. You just have to do it. And the question isn't about best practices, but simply what the statement *means*. (But you are of course right, you *should* use `const`) – jalf Nov 27 '13 at 13:29
  • This is a great answer. I've been programming in C for a while, and feel that I learned something. –  Dec 10 '13 at 04:27
7

char* ptr = "123"; is compatible and almost equivalent to char ptr[] = { '1', '2', '3', '\0' }; (see http://ideone.com/rFOk3R).

In C a pointer can point to one value or an array of contiguous values. C++ inherited this. So a string is just an array of character (char) ended by a '\0'. And a pointer to char can point to an array of char.

The length is given by the number of character between the begining and the terminal '\0'. Exemple of C strlen giving you the length of the string:

size_t strlen(const char * str)
{
    const char *s;
    for (s = str; *s; ++s) {}
    return(s - str);
}

An yes it fails horribly if there is no '\0' at the end.

Johan
  • 3,728
  • 16
  • 25
  • 1
    `char* ptr = "123";` is NOT `char* ptr = { '1', '2', '3', '\0' };` – BЈовић Nov 27 '13 at 10:02
  • 9
    I think you meant to say: `it fails horribly if there is no '\0' at the end.5]È╦³û§E;‼∟█«2 ÜP¯@x6²↕I2÷×}I▄P` – Michael Madsen Nov 27 '13 at 10:06
  • if so, char *pt1="123", char pt2[]="123", is pt2 is same with pt1? – jiafu Nov 27 '13 at 10:12
  • @BЈовић Yep it is `char ptr[] = { '1', '2', '3', '\0' };`, which is compatible to char. I'm going to clarify. – Johan Nov 27 '13 at 10:14
  • thanks,what's more, why int *pt={1,2,3} can't work but int pt[]={1,2,3} can work? can you help this? – jiafu Nov 27 '13 at 10:22
  • 1
    `char* ptr` and `char ptr[]` are not equivalent. In both cases, you can modify the values pointed by ptr. Guess what may happen if you modify the pointer to non-const char pointing to const char array :) Something like this : http://ideone.com/i37oMP – BЈовић Nov 27 '13 at 10:23
  • @BЈовић Hence the "almost" ;) I used `const char*` in my example. But I think the compiler with warning disable do not even complain if you do not put the `const`... And in my memories, there was an option in gcc to set the string literal in a modifiable zone, hence allowing the modification without a crash. – Johan Nov 27 '13 at 10:29
5

A string literal is an array of N const char where N is the length of the literal including the implicit NUL terminator. It has static storage duration and it's implementation defined where it is stored. From here on, it's the same a with a normal array - it decays to a pointer to its first character - that's a const char*. What you have there is not legal (not anymore since onset of C++11 standard) in C++, it should be const char* ch = "123";.

You can get the length of a literal with sizeof operator. Once it decays to a pointer, though, you need to iterate through it and find the terminator (that's what strlen function does).

So, with a const char* ch; you get a pointer to a constant character type that can point to a single character, or to the start of an array of characters (or anywhere between the start and the end). The array can be dynamically, autimatically or statically allocated and can be mutable or not.

In something like char ch[] = "text"; you have an array of characters. This is syntatic sugar for a normal array initializer (as in char ch[] = {'t','e','x','t','\0'}; but note that the literal will still be loaded at the start of the program). What hapens here is:

  • an array with automatic storage duration is allocated
  • its size is deduced from the size of the literal by the compiler
  • the contents of the literal are copied to the array

As a result, you have a region of storage that you can use at will (unlike literals, which must not be written into).

jrok
  • 54,456
  • 9
  • 109
  • 141
3

A pointer to an array?

A pointer points to only one memory address. The phrase that a pointer points to an array is only used in a loose sense---a pointer cannot really store multiple addresses at the same time.

In your example, char *ch="123", the pointer ch is really pointing to the first byte only. You can write code like the following, and it will make perfect sense:

char *ch = new char [1024];
sprintf (ch, "Hello");    
delete [] ch;

char x = '1';
ch = &x;

Please note the use of the pointer ch to point to both the memory allocated by new char [1024] line as well as the address of the variable x, while still being the same pointer type.

C-style strings are null terminated

Strings in C used to be null terminated, i.e., a special '\0' was added to the end of the string and assumed to be there for all char * based functions (such as strlen and printf) This way, you can determine the length of the string by starting at the first byte and continue till you find the byte containing 0x00.

A verbose, sample implementation of anstrlen style function would be

int my_strlen (const char *startAddress)
{
  int count = 0;
  char *ptr = startAddress;
  while (*ptr != 0)
  {
     ++count;
     ++ptr;
  }

  return count;
}
Jaywalker
  • 3,079
  • 3
  • 28
  • 44
  • if so, char *pt1="123", char pt2[]="123", is pt2 same with pt1? – jiafu Nov 27 '13 at 10:13
  • **Almost** same, but not exactly the same. If you replace `char *ch` with `char ch[]` in my first example, the later assignment `ch = &x` will fail. Also, see http://stackoverflow.com/questions/1335786/c-differences-between-char-pointer-and-array – Jaywalker Nov 27 '13 at 10:47
3

There are no strings in C, but there are pointers to characters. *pt is indeed not pointing to a string, but to a single characters (the '1'). However, some functions take char* as argument assume that the byte on the address following the address that their argument points to, is set to 0 if they are not to operate on it.

In your example, if you tried using pt on a function which expects a "null terminated string" (basically, which expects that it will encounter a byte with a value of 0 when it should stop processing data) you will run into a segmentation fault, as x='1' gives x the ascii value of the 1 character, but nothing more, whereas char* pt="123" gives pt the value of the address of 1, but also puts into that memory, the bytes containing ascii values of 1, 2,3 followed by a byte with a value of 0 (zero).

So the memory (in a 8 bit machine) may look like this:

Address = Content (0x31 is the Ascii code for the character 1 (one))

0xa0 = 0x31
0xa1 = 0x32
0xa2 = 0x33
0xa3 = 0x00

Let's suppose that you in the same machine char* otherString = malloc(4),suppose that malloc returns a value of 0xb0, which is now the value of otherString, and we wanted to copy our "pt" (which would have a value of 0xa0) into otherString, the strcpy call would look like so:

strcpy( otherString, pt );

The same as

strcpy( 0xb0, 0x0a );

strcpy would then take the value of address 0xa0 and copy it into 0xb0, it would increment it's pointers to "pt" to 0xa1, check if 0xa1 is zero, if it is not zero, it would increment it's pointer to "otherString" and copy 0xa1 into 0xb1, and so on, until it's "pt" pointer is 0xa3, in this case, it will return as it detected that the end of the "string" has been reached.

This is of cause, not 100% how it goes on, and it could be implemented in many different ways.

Here is one http://fossies.org/dox/glibc-2.18/strcpy_8c_source.html

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
DusteD
  • 1,400
  • 10
  • 14
2
char* pt = "123"; does two things:

1. creates the string literal "123" in ROM (this is usually in .text section) 2. creates a char* which is assigned the beginning of memory location where the string is located.

because of this operations like pt[1] = '2'; are illegal as you would be attempting to write to ROM memory.

But you can assign the pointer to some other memory location without any problems.

Pandrei
  • 4,843
  • 3
  • 27
  • 44
  • if so, char *pt1="123", char pt2[]="123", is pt2 is same with pt1? – jiafu Nov 27 '13 at 10:14
  • no it isn't char p2[] ="123"; has a different effect. The string literal "123" is still created in ROM but it is also copied to the stack and p2 point at the stack copy. So p2[0] = '1' is legal. – Pandrei Nov 27 '13 at 10:22
  • thanks,what's more, why int *pt={1,2,3} can't work but int pt[]={1,2,3} can work? can you help this? – jiafu Nov 27 '13 at 10:24
  • for *pt={1,2,3},*pt points to a location in ROM while for pt[]={1,2,3}, pt points to a location on the stack (where you can write) – Pandrei Nov 27 '13 at 10:31