0
int main() {
  int b=1, c=3, e=4;
  int *a=&b;
  a[1]=c;
  a[2]=e;
  printf("%d, %d, %d\n", a[0], a[1], a[2]);
  return 0;
}

The above code yields 1, 3, 4. Why don't I get a compiler error? The variables don't have to be aligned contiguously in memory, so why can a pointer that stores solely the address of b also point to the others?

Richard
  • 111
  • 5
  • 2
    **UB** (*Undefined Behaviour*) does not necessarily manifest itself with a compiler warning, or a program crash. It may appear to do something useful... until your boss is watching, or you change compiler, or upgrade your OS, ... – pmg Oct 07 '21 at 20:04
  • [Undefined, unspecified and implementation-defined behavior](https://stackoverflow.com/q/2397984) – 001 Oct 07 '21 at 20:06
  • `a` is a pointer to `int`. That's the only thing the compiler know about `a`, so it accepts using `a` as if it was an array by the C language specification (see section 6 of the [comp.lang.c faq](http://c-faq.com/)). It's your responsability, as the programmer, to make sure the "array" is real and safe to use. – pmg Oct 07 '21 at 20:07
  • Because that's just how the C language is specified. To conform to the standard, there's no obligation for the compiler to emit an error or even warning for all Undefined Behaviours. There are different reasons why that is the case but one may argue it is to keep the language and the compiler simple at the cost of potentially having such runtime issues (compare with a language like rust which includes alot of those checks into the language and compile time). – kaylum Oct 07 '21 at 20:09

2 Answers2

4

It's called undefined behavior

With undefined behavior anything may happen. It may print 1, 3, 4 but it could also print 42, 42, 42 or the program could crash or your computer could turn off or.... anything

The only valid access is a[0] but the others are just invalid (aka undefined behavior).

A pointer in C points to one element that you can access using either *pointer or pointer[0].

The C language also allows you to access *(pointer + 1) or pointer[1]. In that case the compiler expect that you have an array of element. And.. if you don't it's your problem - the compiler kind of trust that you know what you are doing and just generates the equivalent code.

So if you do it wrong (like in your posted code) the compiler won't notice. You just end up with a program that has undefined behavior.

The reason that your code is wrong is that a[1] will access the memory just after a[0] (which is b) and expect that another int is located there. And that expectation doesn't hold for your program. Maybe it's true and maybe c is actually located in the memory just after b but there is no guarantee for that. The compiler may place the variables in memory in any order so we can't tell what the memory after b (aka a[0]) contains. So reading it by doing a[1] is just undefined behavior... we won't know what will happen.

To make things even more "strange"... perhaps c and e isn't present in memory.... in fact that's highly likely for your code...

Support Ukraine
  • 42,271
  • 4
  • 38
  • 63
  • do I understand it correct that `a[1]=c` can be understood as "assign the value of c to the location that a+1 points to"? – Richard Oct 07 '21 at 20:09
  • 1
    @Leibniz The compiler understands it and generates code for it **but**... as `a[1]` is not a legal memory address it is undefined behavior when executed – Support Ukraine Oct 07 '21 at 20:11
  • so as long `a[1]` is in the scope of your program (what you never know) everything should work? – Richard Oct 07 '21 at 20:14
  • @Leibniz well... hmmm.... the problem is that you can't know where/what `a + 1` is pointing to so the result of accessing `a[1]` can give any result (including a program crash) – Support Ukraine Oct 07 '21 at 20:21
  • The reason this likely appears to work is that on your current platform, compiler, optimization settings, day of week, angle of sun and phase of the moon, the first three integers were allocated consecutively on the stack in the order you defined them, so the memory addresses just happened to be valid. – Mad Physicist Oct 07 '21 at 20:25
  • but if the address of `pointer+1` is still in the accessible domain of memory of the current program `a[1]=c` will just overwrite the contents that have been stored at this memory address? – Richard Oct 07 '21 at 20:41
  • 1
    @Leibniz To be clear, the thing that's *undefined* here is how the compiler arranges things in memory. Assignments `a[1]=c;` and `a[2]=e;` write values into memory locations that are a mystery to you, and that can *easily* cause a crash or worse. So `a` might be located such that `a[1]` and `a[2]` aren't used for anything else, and you get lucky and your program works. But if you use a different compiler, or even the same compiler under slightly different conditions, you'll get different results. There aren't a lot of guard rails in C to protect you from your self, so you have to be careful. – Caleb Oct 07 '21 at 20:43
  • @Leibniz Yes, if the address the compiler calculates for `a[1]` (or `a[1000]` for that matter) is in memory that your program is allowed to read and write, you can read and write to it, and if something important was there previously, your new value will replace it. – Caleb Oct 07 '21 at 20:46
  • Accessing `a[1]` is UB, meaning anything may happen. Yes, it might access something unrelated, maybe even with a different type, with potentially fatal consequences. Or the compiler recognizes the path it is on always executes UB and collapses it: Poof, and it vanishes without trace from consideration. – Deduplicator Oct 07 '21 at 21:05
0

Pointers and arrays in C - difference

A pointer in C is an address in memory. An array is a contiguous list of values of some type. An array will have enough memory reserved for it to contain all those values; a pointer by itself doesn't have any memory allocated to it beyond that needed to store the address it refers to. As a C programmer, it's your responsibility to make sure that pointers you use refer to valid blocks of memory, and that you don't read from or write to memory locations outside those blocks.

Why don't I get a compiler error?

Because C lets you use array syntax with pointers, and doing so is often valid. Consider:

int b[3]={1, 3, 4};
int *a=&b;

Now b is an array of int, and a points to the beginning of that array, so it's entirely legitimate to read and write a[1] and a[2]. But b only holds 3 integers, so reading a[3] or a[10] would be an error.

Caleb
  • 124,013
  • 19
  • 183
  • 272