0

C arrays allow for negative indexing, but I can't think of a use for that, seeing that you'll never have an element at a negative index.

Sure, you can do this:

struct Foo {
   int arr1[3] = { 1, 2, 3 };
   int arr2[3] = { 4, 5, 6 };
};

int main() {
   Foo foo;
   std::cout << foo.arr2[-2] << std::endl; //output is 2
}

But why would you ever want to do something like that? In any case, when is negative indexing of an array necessary, and in what domains would you be doing it in?

TwistedBlizzard
  • 931
  • 2
  • 9
  • I'm not sure about C++, but I believe that in C, the expression `foo.arr2[-2]` will invoke undefined behavior according to [§6.5.6 ¶8 of the ISO C11 standard](http://port70.net/~nsz/c/c11/n1570.html#6.5.6p8), even if `arr1` is stored immediately before `arr2` (which you cannot rely upon). – Andreas Wenzel Nov 04 '22 at 03:32
  • 1
    _"output is 2"_ is incorrect. The program has undefined behavior. The C++ standard is quite clear about memory access beyond the bounds of the object being pointed to. Even in this trivial case, you don't know how the individual struct members will be aligned. – paddy Nov 04 '22 at 03:32
  • @paddy I ran this on MSVC and tried negative indices -1 through -3, and they all outputted the elements of arr1. I guess it might be different on other compilers though. – TwistedBlizzard Nov 04 '22 at 03:35
  • That's the sneaky thing about undefined behaviour. In the infinite set of possible behaviours is the exact behaviour you expect. In many cases the result is logical and understandable. And, if you know enough and have sufficient guarantees from outside the C++ Standard, exploitable. The C++ Standard simply cannot (or will not) define a particular behaviour. For example, it's only recently that C++ guarantees two's compliment, something pretty much taken for granted for decades. – user4581301 Nov 04 '22 at 03:39
  • @AndreasWenzel You can absolutely rely on the arrays to be adjacent, because they are of the same element type: same alignment requirements. There cannot be padding after an array of `int` due to being followed by an `int` object or array. However, the behavior of the negative indexing is undefined regardless, even though it's obvious what element it is getting at. – Kaz Nov 04 '22 at 03:41
  • 1
    @Kaz: According to [§6.7.2.1 ¶15 of the ISO C11 standard](http://port70.net/~nsz/c/c11/n1570.html#6.7.2.1p15), there may be "unnamed padding" within the structure object. According to ¶17, there may also be such padding at the end of the structure object. The standard does not specify that inserting padding is only permissible in order to satisfy alignment requirements, which you appear to be implying. However, I do admit that I cannot think of any other reason why a compiler would want to insert padding. – Andreas Wenzel Nov 04 '22 at 03:59
  • 1
    @AndreasWenzel: A compiler could insert padding as a debugging feature. Changes in program behavior when extra padding is inserted could provide clues about bugs, and the padding could be filled with sentinel data that is monitored for changes to reveal bugs. – Eric Postpischil Nov 04 '22 at 11:25

2 Answers2

6

Remember that when you index an array, you get the element at index N which is the element at &array[0] + sizeof(array[0]) * N.

Suppose you have this code:

#include <stdio.h>
int main()
{ 
    int a[5] = {5, 2, 7, 4, 3};
    int* b = &a[2];
    printf("%d", b[-1]); // prints 2
   
}
Cedric Martens
  • 1,139
  • 10
  • 23
  • So it is useful when we are assigning an array to another array, but we want to choose what element starts at index 0? – TwistedBlizzard Nov 04 '22 at 03:31
  • 1
    @beangod You can't assign arrays, unfortunately, but you can *point* at somewhere inside the array. This is what Cedric has done here. – user4581301 Nov 04 '22 at 03:35
  • 2
    It's useful when you have a pointer to somewhere inside an object, and you want a relative offset to that pointer that is _still within that object_. – paddy Nov 04 '22 at 03:35
  • 1
    `array[0] + sizeof(array[0]) * N` does not make sense to me, because `array[0]` is the value (not the address) of the first element. – Andreas Wenzel Nov 04 '22 at 03:41
  • Thanks @AndreasWenzel, I have updated it – Cedric Martens Nov 04 '22 at 03:48
  • 1
    Still not correct. With pointer arithmetic it's simply `array + N`. I know what you're trying to say, but if you're calculating an address this way, you need to do it via something like `char*` or `uintptr_t`. _i.e._ `(int*)((char*)array + sizeof(array[0]) * N)` – paddy Nov 04 '22 at 03:54
  • 1
    I think it would be better to describe `&array[0] + sizeof(array[0]) * N` in words, because that expression is wrong. Multiplying with `sizeof(array[0])` is already implied by using pointer arithmetic, so performing an additional (explicit) multiplication would make the result wrong. – Andreas Wenzel Nov 04 '22 at 04:12
  • @paddy I see what you say. I was trying to show a valid mathematical equation that results in the correct address and not valid C code in that section. The point of being understood by beangod but yes you are right if I would want to make this valid C I'm missing some casts – Cedric Martens Nov 04 '22 at 04:13
1

C arrays allow for negative indexing

Not really. Array indexing boils down to pointer arithmetic with the additive (+ and -) operators. From C17 6.5.6/8 emphasis mine:

If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. ...
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

The reason why - is allowed in the first place is because it can be useful for calculating a relative offset based on a certain index. But the offset still has to end up pointing at an item inside the array, or you invoke undefined behavior.

Your struct example actually does just that - there is no guarantee that those two arrays are allocated adjacently (even though it is very likely). The compiler is free to insert padding in between the two arrays. Also the compiler is free to assume that arr1 is never modified in case we do foo.arr2[-2]=3;.

This code could in theory print 2 2 2:

std::cout << foo.arr2[-2] << std::endl;
std::cout << foo.arr1[1] << std::endl; // store foo.arr1[1] in a register
foo.arr2[-2]=3;
std::cout << foo.arr1[1] << std::endl; // print once more using that same register

Lundin
  • 195,001
  • 40
  • 254
  • 396