161

I was just reading some code and found that the person was using arr[-2] to access the 2nd element before the arr, like so:

|a|b|c|d|e|f|g|
       ^------------ arr[0]
         ^---------- arr[1]
   ^---------------- arr[-2]

Is that allowed?

I know that arr[x] is the same as *(arr + x). So arr[-2] is *(arr - 2), which seems OK. What do you think?

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
bodacydo
  • 75,521
  • 93
  • 229
  • 319

9 Answers9

218

That is correct. From C99 §6.5.2.1/2:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).

There's no magic. It's a 1-1 equivalence. As always when dereferencing a pointer (*), you need to be sure it's pointing to a valid address.

Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
  • 2
    Note also that you don't have to dereference the pointer to get UB. Merely computing `somearray-2` is undefined unless the result is in the range from the start of `somearray` to 1 past its end. – RBerteig Aug 13 '10 at 09:12
  • 45
    In older books the `[]` were referenced as a *syntax sugar* for pointer arithmetic. *Favorite* way to confuse beginners is to write `1[arr]` - instead of `arr[1]` - and watch them guessing what that supposed to mean. – Dummy00001 Aug 13 '10 at 14:24
  • 5
    What happens on 64 bit systems (LP64) when you have a 32 bit int index which is negative ? Should the index get promoted to a 64 bit signed int prior to the address calculation ? – Paul R Oct 11 '10 at 16:02
  • 4
    @Paul, from §6.5.6/8 (Additive operators), "When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression." So I think it will be promoted, and `((E1)+(E2))` will be a (64-bit) pointer with the expected value. – Matthew Flaschen Oct 11 '10 at 22:23
  • @Matthew: thanks for that - it sounds like it *should* work as one might reasonably expect. – Paul R Oct 11 '10 at 22:43
  • 1
    A curious side remark: Since the subscript operator [] is defined in such a way that E1[E2] is identical to (*((E1)+(E2))) (see the answer by Matthew Flaschen), it is actually valid C code to write `2[arr]` instead of `arr[2]`. I admit that this is deliberately obfuscating the code, though. – Christian Borgelt Feb 19 '13 at 17:36
85

This is only valid if arr is a pointer that points to the second element in an array or a later element. Otherwise, it is not valid, because you would be accessing memory outside the bounds of the array. So, for example, this would be wrong:

int arr[10];

int x = arr[-2]; // invalid; out of range

But this would be okay:

int arr[10];
int* p = &arr[2];

int x = p[-2]; // valid:  accesses arr[0]

It is, however, unusual to use a negative subscript.

James McNellis
  • 348,265
  • 75
  • 913
  • 977
  • I wouldn't go so far as to say it's invalid, just potentially messy – Matt Joiner Aug 13 '10 at 03:36
  • 21
    @Matt: The code in the first example yields undefined behavior. – James McNellis Aug 13 '10 at 03:40
  • BSTR is a good example in windows. Any debug allocator. Nothing wrong with it. – Hans Passant Aug 13 '10 at 04:10
  • 7
    It is invalid. By the C standard, it explicitly has undefined behavior. On the other hand, if `int arr[10];` were part of a structure with other elements before it, `arr[-2]` could potentially be well-defined, and you could determine if it is based on `offsetof`, etc. – R.. GitHub STOP HELPING ICE Aug 13 '10 at 06:35
  • BSTR and debug allocators both allocate space and return a pointer somewhere inside that space. This property makes a negative offset "safe" for some values of "safe". The example James gives here of `arr[-2]` is explicitly undefined behavior because it attempts to access a location before the beginning of `arr` itself. Note that even computing `arr-2` without accessing the location is undefined behavior. – RBerteig Aug 13 '10 at 09:09
  • Nice examples. By the way, I am wondering whether this issue has been addressed in K&R's Bible (The C Programming Language, 2nd Edition)? If yes, which section? – Qiang Xu May 04 '12 at 14:57
  • @QiangXu: I do not know. It has been a long time since I opened my copy of K&R and I do very little programming in C. – James McNellis May 04 '12 at 17:14
  • 7
    Found it in K&R Section 5.3, near the end: `If one is sure that the elements exist, it is also possible to index backwards in an array; p[-1], p[-2], and so on are syntactically legal, and refer to the elements that immediately precede p[0]. Of course, it is illegal to refer to objects that are not within the array bounds.` Still, your example is better in help me understand it. Thanks! – Qiang Xu May 09 '12 at 20:47
  • @R..: I also thought negative indicies are explicitly definded to provoke undefined behaviour. Now I tried to find a prove in the standard and didn't. Could you point me to it? – alk Jan 22 '13 at 18:24
  • [] operator is defined in terms of + and unary * operators. Negative index has defined behavior iff the offset is defined for +. – R.. GitHub STOP HELPING ICE Jan 22 '13 at 18:49
  • 4
    Sorry for the thread necromancy, but I just love how K&R are ambiguous as to what "illegal" means. The last sentence makes it sound like out-of-bounds accesses throw a compilation error. That book is poison for beginners. – Martin Feb 14 '17 at 21:02
  • 4
    @Martin To be fair, the book was written at an earlier time in our industry's history when it was still very reasonable to expect "illegal" to be interpreted as "do not do this, you are not allowed" rather than "you will be prevented from doing this". – mtraceur Dec 30 '19 at 08:30
  • @mtraceur I don't know whether that was the case. If it is, all the more reason to throw this book into the garbage. – Martin Dec 30 '19 at 22:31
16

Sounds fine to me. It would be a rare case that you would legitimately need it however.

Matt Joiner
  • 112,946
  • 110
  • 377
  • 526
  • 13
    It's not *that* rare - it's very useful in e.g. image processing with neighbourhood operators. – Paul R Oct 11 '10 at 15:56
  • I just needed to use this because I am creating a memory pool with a stack and heap [ structure / design ] . The stack growing towards higher memory addresses, the heap growing towards lower memory addresses. Meeting in the middle. – KANJICODER Jan 24 '20 at 16:12
9

What probably was that arr was pointing to the middle of the array, hence making arr[-2] pointing to something in the original array without going out of bounds.

Igor Zevaka
  • 74,528
  • 26
  • 112
  • 128
7

I'm not sure how reliable this is, but I just read the following caveat about negative array indices on 64-bit systems (LP64 presumably): http://www.devx.com/tips/Tip/41349

The author seems to be saying that 32 bit int array indices with 64 bit addressing can result in bad address calculations unless the array index is explicitly promoted to 64 bits (e.g. via a ptrdiff_t cast). I have actually seen a bug of his nature with the PowerPC version of gcc 4.1.0, but I don't know if it's a compiler bug (i.e. should work according to C99 standard) or correct behaviour (i.e. index needs a cast to 64 bits for correct behaviour) ?

Paul R
  • 208,748
  • 37
  • 389
  • 560
4

I know the question is answered, but I couldn't resist sharing this explanation.

I remember Principles of Compiler design: Let's assume a is an int array and size of int is 2, and the base address for a is 1000.

How will a[5] work ->

Base Address of your Array a + (index of array *size of(data type for array a))
Base Address of your Array a + (5*size of(data type for array a))
i.e. 1000 + (5*2) = 1010

This explanation is also the reason why negative indexes in arrays work in C; i.e., if I access a[-5] it will give me:

Base Address of your Array a + (index of array *size of(data type for array a))
Base Address of your Array a + (-5 * size of(data type for array a))
i.e. 1000 + (-5*2) = 990

It will return the object at location 990. So, by this logic, we can access negative indexes in arrays in C.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
Ajinkya Patil
  • 741
  • 1
  • 6
  • 17
2

About why would someone want to use negative indexes, I have used them in two contexts:

  1. Having a table of combinatorial numbers that tells you comb[1][-1] = 0; you can always check indexes before accessing the table, but this way the code looks cleaner and executes faster.

  2. Putting a centinel at the beginning of a table. For instance, you want to use something like

     while (x < a[i]) i--;
    

but then you should also check that i is positive.
Solution: make it so that a[-1] is -DBLE_MAX, so that x&lt;a[-1] will always be false.

Maximilian Ast
  • 3,369
  • 12
  • 36
  • 47
1
#include <stdio.h>

int main() // negative index
{ 
    int i = 1, a[5] = {10, 20, 30, 40, 50};
    int* mid = &a[5]; //legal;address,not element there
    for(; i < 6; ++i)
    printf(" mid[ %d ] = %d;", -i, mid[-i]);
}
Ardent Coder
  • 3,777
  • 9
  • 27
  • 53
  • 2
    While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – β.εηοιτ.βε May 31 '20 at 14:39
  • Python groovy... have them. A simple use-case is one can access last element of an array without knowing the array size, a very real requirement in many Project situations. Also many DSLs benefit from this. – Rathinavelu Muthaliar Jun 02 '20 at 02:25
-3

I would like to share an example:

GNU C++ library basic_string.h

[notice: as someone points out that this is a "C++" example, it may not be fit for this topic of "C". I write a "C" code, which has same concept as the example. At least, GNU gcc compiler doesn't complain anything.]

It uses [-1] to move pointer back from user string to management information block. As it alloc memory once with enough room.

Said " * This approach has the enormous advantage that a string object * requires only one allocation. All the ugliness is confined * within a single %pair of inline functions, which each compile to * a single @a add instruction: _Rep::_M_data(), and * string::_M_rep(); and the allocation function which gets a * block of raw bytes and with room enough and constructs a _Rep * object at the front. "

Source code: https://gcc.gnu.org/onlinedocs/gcc-10.3.0/libstdc++/api/a00332_source.html

   struct _Rep_base
   {
     size_type               _M_length;
     size_type               _M_capacity;
     _Atomic_word            _M_refcount;
   };

   struct _Rep : _Rep_base
   {
      ...
   }

  _Rep*
   _M_rep() const _GLIBCXX_NOEXCEPT
   { return &((reinterpret_cast<_Rep*> (_M_data()))[-1]); }

It explained:

*  A string looks like this:
*
*  @code
*                                        [_Rep]
*                                        _M_length
*   [basic_string<char_type>]            _M_capacity
*   _M_dataplus                          _M_refcount
*   _M_p ---------------->               unnamed array of char_type
*  @endcode
*
*  Where the _M_p points to the first character in the string, and
*  you cast it to a pointer-to-_Rep and subtract 1 to get a
*  pointer to the header.
*
*  This approach has the enormous advantage that a string object
*  requires only one allocation.  All the ugliness is confined
*  within a single %pair of inline functions, which each compile to
*  a single @a add instruction: _Rep::_M_data(), and
*  string::_M_rep(); and the allocation function which gets a
*  block of raw bytes and with room enough and constructs a _Rep
*  object at the front.
*
*  The reason you want _M_data pointing to the character %array and
*  not the _Rep is so that the debugger can see the string
*  contents. (Probably we should add a non-inline member to get
*  the _Rep for the debugger to use, so users can check the actual
*  string length.)
*
*  Note that the _Rep object is a POD so that you can have a
*  static <em>empty string</em> _Rep object already @a constructed before
*  static constructors have run.  The reference-count encoding is
*  chosen so that a 0 indicates one reference, so you never try to
*  destroy the empty-string _Rep object.
*
*  All but the last paragraph is considered pretty conventional
*  for a C++ string implementation.

// use the concept before, to write a sample C code

#include "stdio.h"
#include "stdlib.h"
#include "string.h"

typedef struct HEAD {
    int f1;
    int f2;
}S_HEAD;

int main(int argc, char* argv[]) {
    int sz = sizeof(S_HEAD) + 20;

    S_HEAD* ha = (S_HEAD*)malloc(sz);
    if (ha == NULL)
      return -1;

    printf("&ha=0x%x\n", ha);

    memset(ha, 0, sz);

    ha[0].f1 = 100;
    ha[0].f2 = 200;

    // move to user data, can be converted to any type
    ha++;
    printf("&ha=0x%x\n", ha);

    *(int*)ha = 399;

    printf("head.f1=%i head.f2=%i user data=%i\n", ha[-1].f1, ha[-1].f2, *(int*)ha);

    --ha;
    printf("&ha=0x%x\n", ha);

    free(ha);

    return 0;
}



$ gcc c1.c -o c1.o -w
(no warning)
$ ./c1.o 
&ha=0x13ec010
&ha=0x13ec018
head.f1=100 head.f2=200 user data=399
&ha=0x13ec010

The library author uses it. May it be helpful.

Jian Wang
  • 65
  • 5