So the way I understand things at present is that an array of char
is just X amount of 8 bit values stored next to each other in memory...
Almost correct, but a char
is not guaranteed to be 8-bits (an octet) in C and C++. Remember that C and C++ can target almost any processor and ISA in existence, including rare and exotic machines with their own peculiarities. I recommend reading this QA: Will a `char` always-always-always have 8 bits?
...ending with a 00.
This is an assumption that's not entirely correct, sorry.
While a "string" must have a terminator (as per the C language specification), an array of characters may not necessarily have a a NULL
-terminator (the '\0'
char at the end). A string that is initialized from a string literal will have a null terminator appended, but you can still construct a string or char
-array without one.
So what I would like to do is to iterate over this collection of bits in memory and combine them into smaller or larger segments. An example would be if I had 8 chars and I wanted to turn that string of bits into two 32 bit integers or one 64 bit integer.
If you want to force C++ to interpret a range of memory (that is 8 octet-bytes, or 8 char
long) then use reinterpret_cast
and telling C++ to look at the value of the data pointed-to by the string's pointer:
const char* stringFromLiteral = "abcdefgh";
uint64_t* pointerToStringLiteralPretentingToBePointerToUInt64 = reinterpret_cast<uint64_t*>( stringFromLiteral );
uint64_t asUnsigned64bitInteger = *pointerToStringLiteralPretentingToBePointerToUInt64;
In this case, here's what the process' read-only memory and stack (probably) looks like, assuming that read-only memory is at 0x0800
and the current function's stack-frame starts at 0x1000
, and it's a 32-bit big-endian word machine (so sizeof(char*) == 4
) and all values are aligned to 16-bit boundaries:
(Each line is span of 8 bytes of memory, with each line prefixed with the address of each line's first byte. Each hexadecmial number after the line's address represents a single char
(octet-byte) value. Each ....
represents an octet with an undefined value (in reality, its value would be either whatever value was left behind by the last user, 0x00
(for pre-zeroed memory) or some debugger-generated overflow-detection test pattern).
0x0800 0x61 0x62 0x63 0x64 0x65 0x66 0x67 0x68 # The "abcdefgh" string literal is in read-only memory at 0x8000 through 0x0808, including the 0x00 terminator byte.
0x0808 0x00 .... .... .... .... .... .... ....
0x0810 .... ... .... .... .... .... .... ....
[ Jump forward about 0x200 bytes ]
0x1000 0x00 0x00 0x80 0x00 .... .... .... .... # The `stringFromLiteral` variable has a 4-byte sized pointer to the string at 0x0800:
0x1008 .... .... .... .... .... .... .... ....
0x1010 0x61 0x62 0x63 0x64 0x65 0x66 0x67 0x68 # The `asUnsigned64bitInteger` value is a 64-bit value that is the same as 8 bytes copied from 0x0800, but without the terminator
0x1018 .... ... .... .... .... .... .... ....
0x1020 .... ... .... .... .... .... .... ....