3

I have a problem I can't seem to figure it out. I hope someone could be able to throughly explain it to me. I get that its very elementary..

Problem is following:

How do the following variables go into consecutive memory addresses and their value?

int8_t a = 0x65;
char b = 'k';
uint16_t c = 22222;

for example,

int8_t esim = 9;

would be stored as

0000:00001001

Anyone?

teepa
  • 95
  • 1
  • 7

2 Answers2

10

I have a problem I can't seem to figure it out. I hope someone could be able to throughly explain it to me. I get that its very elementary.

And here you go wrong. It is very elementary but not quite as you think it to be

  • Distinct variables need not be stored in consecutive locations in memory. (Or as the last bullet says, not be stored in memory at all)

  • The storage of individual bytes within a multibyte value is implementation-defined - check your compiler manuals. Most personal computing processors nowadays use little-endian 2's complement for integers, however.

  • Even if they were organized by the compiler to appear in memory in exactly the same order as they are declared, each datatype can require an implementation-specific alignment and can therefore start only at an address that is multiple of this alignment

  • And finally, there need not be any variables or memory allocations whatsoever, the compiler just needs to generate a program that behaves as if there were such variables.


We can certainly say something about your program however. If the excerpt

#include <stdint.h>

int8_t a = 0x65;
char b = 'k';
uint16_t c = 22222;

compiles and the variables are placed in memory, then

  • a will be 8 bits with value 0b01100101
  • c will be 16 bits and stored in memory as 2 bytes - 0b11001110 and 0b01010110 at increasing memory addresses (little-endian, usual), or the same 2 bytes reversed: 0b01010110 and 0b11001110 (big-endian).
  • As for b, if the execution character set is ASCII-compatible, as int8_t exists, then char must also be 8 bits wide, then its value will be stored as 0b01101011 (i.e. 107, the ASCII code of k).

Additionally, most often the alignment requirement of an uint16_t object is 2; if that is the case, it must start at an even address.

This deduction is only possible because the int8_t and uint16_t must not have padding bits, hence from having them we can deduce that the width of the smallest addressable unit (char) must be 8 bits too. And uint16_t has only 2 bytes, hence it can only have two choices for endianness.


It is easy to test how GCC organizes global variables. Consider the module having the source code

#include <stdint.h>
int8_t a = 0x65;
char b = 'k';
uint16_t c = 22222;

we can compile it to an object file:

% gcc -c layouttest.c -o layouttest.o

and then use nm to list the symbols and their addresses:

% nm layouttest.o            
0000000000000000 D a
0000000000000001 D b
0000000000000002 D c

It seems to be as Jabberwocky's answer expects. If we now compile with -O3, the results can be different:

% gcc -c layouttest.c -o layouttest.o -O3; nm layouttest.o
0000000000000003 D a
0000000000000002 D b
0000000000000000 D c

I.e. the variables were reorganized, with c at the bottom.

2

Complement to Antti Haapala's answer:

While it is totally up to the compiler how and where to store variables, the memory layout of variables declared consecutively is often in the same order than the order of declaration especially in non optimized code.

So the variables declared like this:

int8_t a = 0x65;
char b = 'k';
uint16_t c = 22222;

could be stored like this:

Address  Content in binary
--------------------------
0000:    01010101  (0x64)
0001:    01101011  ('k' = 107, ASCII code of k)
0002:    11001110  (low bits of 22222)
0003:    01010110  (high bits of 22222)

where Address is the relative address with repsect to the memory address of variable a.

Once again: don't assume that this is the necessarily the case on your platform.

Jabberwocky
  • 48,281
  • 17
  • 65
  • 115