3

I meet the question in OS course. Here is the code from 6.828 (Operating System) online course. It meant to let learners practice the pointers in C programming language.

#include <stdio.h>
#include <stdlib.h>

void
f(void)
{
    int a[4];
    int *b = malloc(16);
    int *c;
    int i;

    printf("1: a = %p, b = %p, c = %p\n", a, b, c);

    c = a;
    for (i = 0; i < 4; i++)
    a[i] = 100 + i;
    c[0] = 200;
    printf("2: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",
       a[0], a[1], a[2], a[3]);

    c[1] = 300;
    *(c + 2) = 301;
    3[c] = 302;
    printf("3: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",
       a[0], a[1], a[2], a[3]);

    c = c + 1;
    *c = 400;
    printf("4: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",
       a[0], a[1], a[2], a[3]);

    c = (int *) ((char *) c + 1);
    *c = 500;
    printf("5: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",
       a[0], a[1], a[2], a[3]);

    b = (int *) a + 1;
    c = (int *) ((char *) a + 1);
    printf("6: a = %p, b = %p, c = %p\n", a, b, c);
}

int
main(int ac, char **av)
{
    f();
    return 0;
}

I copy it to a file and compile it use gcc , then I got this output:

$ ./pointer 
1: a = 0x7ffd3cd02c90, b = 0x55b745ec72a0, c = 0x7ffd3cd03079
2: a[0] = 200, a[1] = 101, a[2] = 102, a[3] = 103
3: a[0] = 200, a[1] = 300, a[2] = 301, a[3] = 302
4: a[0] = 200, a[1] = 400, a[2] = 301, a[3] = 302
5: a[0] = 200, a[1] = 128144, a[2] = 256, a[3] = 302
6: a = 0x7ffd3cd02c90, b = 0x7ffd3cd02c94, c = 0x7ffd3cd02c91

I can easily understand the output of 1,2,3,4. But it's hard for me to understand the output of 5. Specially why a[1] = 128144 and a[2] = 256?
It seems this output is the result of

c = (int *) ((char *) c + 1);
*c = 500;

I have trouble understand the function of the code c = (int *) ((char *) c + 1). c is a pointer by definiton int *c. And before the output of 5th line, c points to the second address of array a by c = a and c = c + 1. Now what's the meaning of (char *) c and ((char *) c + 1) ,then (int *) ((char *) c + 1)?

zbo
  • 37
  • 6
  • 3
    MIT is *teaching* that garbage code? Because `c = (int *) ((char *) c + 1)` is [risking undefined behavior](http://port70.net/~nsz/c/c11/n1570.html#6.3.2.3p7) by just creating that pointer, and then `*c = 500;` [**IS** undefined behavior](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule). I guess those that can't do, teach. – Andrew Henle Oct 19 '22 at 03:29
  • Actually `((char *) c + 1)` points to second byte of `a[1]`. `*c = 500` overwrites all except first bytes of `a[1]` and first byte of `a[2]`. On little endian architecture `a[1]` becomes `(400 & 0xFF) | (500 << 8)`=128144, and `a[2]` becomes `300 & ~0xFF`. But as said in previous comment, it is UB. – dimich Oct 19 '22 at 03:48

3 Answers3

2

Although this is undefined behavior per the standard, it has a clear meaning in "ancient C", and it clearly works that way on the machine/compiler you're working with.

First, it casts c to a (char *), which means that pointer arithmetic will work in units of sizeof(char) (i.e. one byte) instead of sizeof(int). Then it adds one byte. Then it converts the result back to (int *). The result is an int pointer that now refers to an address one byte higher than it used to. Since c was pointing at a[1] beforehand, afterwards *c = 500 will write to the last three bytes of a[1] and the first byte of a[2].

On many machines (but not x86) this is an outright illegal thing to do. An unaligned access like that would simply crash your program. The C standard goes further and says that that code is allowed to do anything: when the compiler sees it, it can generate code that crashes, does nothing, writes to a completely unrelated bit of memory, or causes a small gnome to pop out of the side of your monitor and hit you with a mallet. However, sometimes the easiest thing to do in the case of UB is also the straightforward obvious thing, and this is one of those cases.

Your course material is trying to show you something about how numbers are stored in memory, and how the same bytes can be interpreted in different ways depending on what you tell the CPU. You should take it in that spirit, and not as a guide to writing decent C.

hobbs
  • 223,387
  • 19
  • 210
  • 288
  • 1
    This was never defined in any version of C, ancient or otherwise. It only runs on OP's machine because a certain vendor of 16-bit microprocessors decided to allow unaligned access to preserve compatibility with 8-bit hardware back when. "Ancient C" ran on PDP-11 and it most certainly didn't work this way. – n. m. could be an AI Oct 19 '22 at 05:20
  • @n.1.8e9-where's-my-sharem. Many 16 bitters don't have alignment requirements, not just ancient ones. As for how the C99 effective type rules are supposed to make sense when you only access half an object, well that's another story. Although the pointer conversion rules in 6.3.2.3 do label all misaligned access as UB. – Lundin Oct 19 '22 at 06:34
  • 1
    @Lundin Ancient ones **do** have alignment requirements. That's why C has them. – n. m. could be an AI Oct 19 '22 at 07:21
  • @n.1.8e9-where's-my-sharem. My point is that it's an artificial requirement from the PC world that misaligned access = always UB. That complicates compiler and program design significantly and needlessly for CPUs that have no alignment requirements in hardware. – Lundin Oct 19 '22 at 08:08
  • @Lundin I don't think this is how the standard or compilers work or should work, but this is off topic. – n. m. could be an AI Oct 19 '22 at 08:48
  • @Lundin *My point is that it's an artificial requirement from the PC world that misaligned access = always UB.* [Not really](https://en.wikipedia.org/wiki/SPARC). The posted code would have generated `SIGBUS` in ancient times on that, and that was no "PC". – Andrew Henle Oct 19 '22 at 10:42
  • @AndrewHenle "PC" as in hosted systems. Which is a minority of all computers produced in the world. There are probably still more 8 bitters manufactured per year than the total of all hosted systems produced combined. – Lundin Oct 19 '22 at 10:48
1

At the first output, c is point to a random address.

After c = a;, c point to a so when you change value of c[0], c[1], *(c + 2), 3[c] the value of a change accordingly.

At the following line:

    c = c + 1;

c is now point to a[1] and the address would be 0x7ffd3cd02c94.

Now go to the line that you are asking for: c = (int *) ((char *) c + 1); it will do as following:

  • Convert c to a pointer type char which still point to same address 0x7ffd3cd02c94.
  • Do increase the pointer 1, so now the address would be 0x7ffd3cd02c95
  • Assign the new address again to c (int *).

Before that command, c will point to address: 0x7ffd3cd02c94-0x7ffd3cd02c97. But after that the address would be: 0x7ffd3cd02c95-0x7ffd3cd02c98. That is the reason the value at [5] is [![enter image description here][1]][1]

Now it is clear why the value changed as you observed.

NOTE: This is correct for little endian system. For big endian the result would be a little bit different. AND for some embedded platform which not allow UNALIGNED access, you should got exception at that line. [1]: https://i.stack.imgur.com/eU0Tb.png

ThongDT
  • 162
  • 7
0

This is a result of undefined behavior. You invoke undefined behavior because you dereference a null pointer (for array a) and the array size is zero (for array b) - for this case, this is equivalent to c= a; b= 0; c = (int *) ((char *) c + 1). This should trigger a warning, which is why I also added -Wall -pedantic -std=c99 in the above example.

To answer your question about (char *) c and ((char *) c + 1).

(char *) c: Since c is a pointer, c->type is int * (pointer to int). This makes c->type have type char *. You take the address of the second element in the array c and assign it to a. So, c->type is then char * (address of second element in the array c). c[0] (index 0) is therefore the first element in array c.

((char *) c + 1) - c + 1 = &c[1]. c[0] + 1 = c[1] (first element of the array c+1).