0

So I was doing an exercise to see if I was using memset correctly.

Here's the original code I wrote which was supposed to memset some addressese to have value 50:

int main(){
    int *block1 = malloc(2048);
    memset(block1, 50, 10);
    // int count = 0;
    for (int *iter = block1; (uint8_t *) iter < (uint8_t *)block1 + 10; iter = (int *) ((uint8_t *)iter + 1) ){
        printf("%p : %d\n", iter, *iter);
    }
    return 0;
}

I expected every address in memory to store the value 50. HOWEVER my output was:

(Address : Value)

0x14e008800 : 842150450
0x14e008801 : 842150450
0x14e008802 : 842150450
0x14e008803 : 842150450
0x14e008804 : 842150450
0x14e008805 : 842150450
0x14e008806 : 842150450
0x14e008807 : 3289650
0x14e008808 : 12850
0x14e008809 : 50

I was stuck on the problem for a while and tried a bunch of things until I randomly decided that maybe my pointer is the problem. I then tried a uint8_t pointer.

int main(){
    uint8_t *block1 = malloc(2048);
    memset(block1, 50, 10);
    for (uint8_t  *iter = block1; iter < block1 + 10; iter++ ){
        printf("%p : %d\n", iter, *iter);
    }
    return 0;
}

All I did was change the type of the block1 variable and my iter variable to be uint8_t pointers instead of int pointers and I got the correct result!

0x13d808800 : 50
0x13d808801 : 50
0x13d808802 : 50
0x13d808803 : 50
0x13d808804 : 50
0x13d808805 : 50
0x13d808806 : 50
0x13d808807 : 50
0x13d808808 : 50
0x13d808809 : 50

My question is then, why did that make such a difference?

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
BigBear
  • 188
  • 10
  • 2
    `memset` sets bytes. If you take that value 842150450 and convert it to hex (base 16), you'll find that it's `0x32323232`, where `0x32` is the hexadecimal value of 50. On your machine, `int` is 4 bytes, and `memset` set all four bytes of the `int` (so to speak) to 50. (Actually, `memset` didn't know they were bytes of an `int`. `memset` thought they were just a block of memory — as your variable name `block1` accurately suggests. Later, when you use a pointer to step through the block, you're *interpreting* it as either 8-bit byte or 32-bit ints, depending on the type of pointer you use.) – Steve Summit Sep 26 '22 at 20:31
  • iter is an int variable. You cast it to a byte size when you compare in the for loop and when you increment in the for loop but NOT when you print the value out That %d means this is an int (four bytes on your system) so printf prints four contiguous bytes as an integer for you. – Jerry Jeremiah Sep 26 '22 at 20:33
  • @BigBear In this call printf("%p : %d\n", iter, *iter); you output integers that occupy 4 bytes because the pointer iter has the type int *. – Vlad from Moscow Sep 26 '22 at 20:34
  • BigBear, Note that `iter = (int *) ((uint8_t *)iter + 1` and `*iter` are _bad_ as `iter` is an `int *` and not specified to accept unaligned `int` addresses. – chux - Reinstate Monica Sep 26 '22 at 21:42

3 Answers3

5

My question is then, why did that make such a difference?

Because the exact type of a pointer is hugely important. Pointers in C are not just memory addresses. Pointers are memory addresses, along with a notion of what type of data is expected to be found at that address.

If you write

uint8_t *p;
... p = somewhere ...
printf("%d\n", *p);

then in that last line, *p fetches one byte of memory pointed to by p.

But if you write

int *p;
... p = somewhere ...
printf("%d\n", *p);

where, yes, the only change is the type of the pointer, then in that exact same last line, *p now fetches four bytes of memory pointed to by p, interpreting them as a 32-bit int. (This assumes int on your machine is four bytes, which is pretty common these days.)

When you called

memset(block1, 50, 10);

you were asking for some (though not all) of the individual bytes of memory in block1 to be set to 50.

When you used an int pointer to step over that block of memory, fetching (as we said earlier) four bytes of memory at a time, you got 4-byte integers where each of the 4 bytes contained the value 50. So the value you got was

(((((50 << 8) | 50) << 8) | 50) << 8) | 50

which just happens to be exactly 842150450.

Or, looking at it another way, if you take that value 842150450 and convert it to hex (base 16), you'll find that it's 0x32323232, where 0x32 is the hexadecimal value of 50, again showing that we have four bytes each with the value 50.

Now, that all makes sense so far, although, you were skating on thin ice in your first program. You had int *iter, but then you said

for(iter = block1; (uint8_t *) iter < (uint8_t *)block1 + 10; iter = (int *) ((uint8_t *)iter + 1) )

In that cumbersome increment expression

iter = (int *) ((uint8_t *)iter + 1)

you have contrived to increment the address in iter by just one byte. Normally, we say

iter = iter + 1

or just

iter++

and this means to increment the address in iter by several bytes, so that it points at the next int in a conventional array of int.

Doing it the way you did had three implications:

  1. You were accessing a sort of sliding window of int-sized subblocks of block1. That is, you fetched an int made from bytes 1, 2, 3, and 4, then an int made from bytes 2, 3, 4, and 5, then an int made from bytes 3, 4, 5, and 6, etc. Since all the bytes had the same value, you always got the same value, but this is a strange and generally meaningless thing to do.
  2. Three out of four of the int values you fetched were unaligned. It looks like your processor let you get away with this, but some processors would have given you a Bus Error or some other kind of memory-access exception, because unaligned access aren't always allowed.
  3. You also violated the rule about strict aliasing.
Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • You LITERALLY answered my question in SPADES Steve! THANK YOU SO MUCH! – BigBear Sep 26 '22 at 21:00
  • Yeah I thought my increment expression was doing a lot... I just wanted to make sure that the pointer arithmetic was okay... I guess in trying hard to make it numerically correct I violated like a dozen rules lmao – BigBear Sep 26 '22 at 21:02
  • @BigBear Glad the answer helped. One more thing: There are exceptions, but these days, as a general rule, if you're using explicit pointer casts like that, you're doing something wrong. (If the same code, without the casts, gives you a warning, most of the time the warning indicates an actual problem, that you need to track down and fix. The cast may make the warning message go away, but the underlying problem probably remains.) – Steve Summit Sep 26 '22 at 21:05
1

The function memset sets each byte of the supplied memory with the specified value.

So in this call

memset(block1, 50, 10);

10 bytes of the memory addressed by the pointer block1 were set with the value 50.

But using the pointer iter that has the type int * you are outputting at once sizeof( int ) bytes pointed to by the pointer.

On the other hand if to declare the pointer as having the type

uint8_t  *iter;

then you will output only one byte of memory.

Consider the following demonstration program.

#include <stdio.h>

int main( void ) 
{
    int x;
    memset( &x, 50, sizeof( x ) );

    printf( "x = %d\n", x );

    for ( const char *p = ( const char * )&x; p != ( const char * )&x + sizeof( x ); ++p )
    {
        printf( "%d", *p ); 
    }
    putchar( '\n' );
}

The program output is

x = 842150450
50505050

That is each byte of the memory occupied by the integer variable x was set equal to 50.

If to output each byte separately then the program outputs the values 50.

To make it even more clear consider one more demonstration program.

#include <stdio.h>

int main( void ) 
{
    printf( "50 in hex is %#x\n", 50 );
    int x = 0x32323232;
    printf( "x = %d\n", x );
}

The program output is

50 in hex is 0x32
x = 842150450

That is the value 50 in hexadecimal is equal tp 0x32.

Thus this initialization

int x = 0x32323232;

yields the same result as the call of the function memset

memset( &x, 50, sizeof( x ) );

that you could equivalently rewrite like

memset( &x, 0x32, sizeof( x ) );
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
1

In the first case you are de-referencing the int* iter so it prints the (misaligned) int value at the address, not the byte value.

It is clear what is happening when you look at the value 842150450 in hexadecimal - 0x32323232 - that is each byte of the integer is 0x32 (50 decimal). The bytes after the tenth byte are undefined, but happen to be zero in this case and the machine is little-endian, so it tails off with 0x323232, 0x3232, and finally 0x32.

Clearly the second case is the more "correct" solution, but you can fix the first case thus;

printf("%p : %d\n", 
       (void*)iter, 
       *(uint8_t*)iter);
Clifford
  • 88,407
  • 13
  • 85
  • 165