-3

I'm struggling to understand why this line of code makes sense:

unsigned char *thing = (unsigned char *)"YELLOW SUBMARINE";

An unsigned char holds one byte of data (i.e. 'Y') so an unsigned char* should be a pointer to that single byte of data.

If I try to put more than one character inside, the compiler should, in my thinking, get angry. Yet it doesn't mind that I'm putting 16 bytes here and telling the compiler I'm only pointing to a single unsigned char. Could someone please explain?

My thinking was that the compiler would allocate memory for one unsigned char, then write the whole string into it, overwriting adjacent memory and causing havoc. Yet I'm able to correctly dereference this pointer later on and retrieve the whole string, so there doesn't seem to be a problem.

Context: I'm trying to convert a string to something of this form (unsigned char *).

Thanks in advance!

James R
  • 71
  • 4
  • 4
    Pointers and characters string literals [are fully explained in every good C++ book](https://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list), with much more detail that can be given in a brief answer on stackoverflow.com. – Sam Varshavchik Jun 02 '19 at 15:47
  • 2
    "My thinking was that the compiler would allocate memory for one unsigned char" - your thinking is completely wrong. What does your C++ textbook have to say on the subject? –  Jun 02 '19 at 15:49
  • 2
    Don't be afraid. The compiler is very able to count the number of characters in `"YELLOW SUBMARINE"` plus the 0 terminator and allocate a sufficient number of bytes in the generated binary to store that. Afterwards, it can assign/init the `unsigned char *thing` pointer with the address of that allocation. ;-) Btw. there are 17 bytes to allocate. – Scheff's Cat Jun 02 '19 at 15:50
  • 1
    Pointers don't allocate any memory by themselves. They just point to already-allocated memory. – eesiraed Jun 02 '19 at 15:54
  • It would be better to make `thing` a `const unsigned char*`. Changing the contents of that address is something which might be not forgiven. – Scheff's Cat Jun 02 '19 at 15:55
  • Write down the address of the first house on your home street. Does the presence of the other houses in that street cause any problems with what you wrote? – molbdnilo Jun 02 '19 at 16:03
  • No, but in this case we are trying to store the address of a whole street in a pointer designed to store the address of a single house - hence the question – James R Jun 02 '19 at 16:14
  • It's called c string and you can read about it here https://www.tutorialspoint.com/cprogramming/c_strings.htm – Thomas Sablik Jun 02 '19 at 17:03

2 Answers2

1

First of all, if the type of a pointer in C++ is unsigned char * that basically only tells the compiler the following things:

  • When dereferencing the pointer, the compiler should just read one byte from memory, and we will treat that byte as a number from 0 to 255. (If your system is very unusual then those rules might be a bit different.)
  • When doing pointer addition and subtraction, the size of the elements pointed to by the pointer is 1 (or whatever sizeof(unsigned char) is on your system).

The type of a pointer also has some subtle effects if you look up the concept of strict aliasing and memory alignment, but I do not want to go into detail on that.

Next, the type of a pointer does not tell the compiler whether the pointer has a valid value that can be dereferenced. The pointer might be uninitialized. It might point to memory that was freed earlier. It might be NULL. It might point one past the last element of an array (something explicitly allowed by the standard).

The type of a pointer does not tell the compiler the whole layout of the memory. You can define a string and then define pointers that point to the beginning middle, end, or one past the end, and all of these are valid pointers with the same type. The type system for the basic pointers in C++ is simply not complicated enough to encode that sort of information, because that is not how the language was designed.

const char * p = "James";  // valid pointer to the beginning of a string
const char * p1 = p + 1;   // pointer to 'a'
const char * p5 = p + 5;   // pointer to the null terminator (0) after 's'
const char * p6 = p + 6;   // pointer to the memory directly after the terminator

Finally, the code you presented has a cast in it, which allows you to do pretty much any conversion you want without much checking by the compiler. So you should not be surprised when the compiler allows you to cast one thing to another.

David Grayson
  • 84,103
  • 24
  • 152
  • 189
0

My thinking was that the compiler would allocate memory for one...

If I try to put more than one character inside..

Actually, you don`t put anything inside. And this is not a creating of new 'string' which will hold "YELLOW SUBMARINE". You are just pointing at the beginning of string which stored in static memory.

Community
  • 1
  • 1
gimme_danger
  • 788
  • 1
  • 6
  • 14
  • "YELLOW SUBMARINE" is in static memory, not on the stack. Casting it to a non const char * is UB. Nothing is copied. – doug Jun 02 '19 at 16:02
  • My fault, thanks for correction – gimme_danger Jun 02 '19 at 16:09
  • Hi - thanks for answering and not joining the orgy of patronising comments on the question. Yes I understand that a pointer stores the address of an object and not its value - but in order for "YELLOW SUBMARINE" to have an address, the string must first be stored somewhere in memory? So why is the pointer a pointer to an `unsigned char` rather than to a `string` or `unsigned char thing[17]`, for example? – James R Jun 02 '19 at 16:09
  • Sorry for late reply, somehow I did not receive a notification about your comment. First of all, we should clarify again which memory is used in this case. I will provide only a sketch. 1. “YELLOW SUBMARINE” is a string literal. The standard says that them are stored in static memory. 2. You perform an unsafe c-style casting to this literal and this is probably an undefined behavior, because string literals are not allowed to be modified in any way. 3. Anyway, after casting your pointer Thing points to a specific location in memory where unsigned char is considered to be located. – gimme_danger Jun 04 '19 at 11:10