1794

As Joel points out in Stack Overflow podcast #34, in C Programming Language (aka: K & R), there is mention of this property of arrays in C: a[5] == 5[a]

Joel says that it's because of pointer arithmetic but I still don't understand. Why does a[5] == 5[a]?

Cœur
  • 37,241
  • 25
  • 195
  • 267
Dinah
  • 52,922
  • 30
  • 133
  • 149
  • 61
    would something like a[+] also work like *( a++) OR *(++a) ? – Egon May 13 '10 at 16:14
  • 56
    @Egon: That's very creative but unfortunately that's not how compilers work. The compiler interprets `a[1]` as a series of tokens, not strings: *({integer location of}a {operator}+ {integer}1) is the same as *({integer}1 {operator}+ {integer location of}a) but is not the same as *({integer location of}a {operator}+ {operator}+) – Dinah May 13 '10 at 17:24
  • The C language has chosen to implement array access purely as a [syntactic sugar](http://en.wikipedia.org/wiki/Syntactic_sugar). That is why the compiler cannot check that the left part is a pointer. Then, it somehow happens that pointer arithmetic makes the resulting program valid even when it is not. – Eldritch Conundrum Mar 23 '12 at 10:54
  • 1
    @EldritchConundrum: I disagree about it not being valid. Ritchie himself says that it is. It may be an unintended consequence but I believe it is still valid. – Dinah May 24 '12 at 00:23
  • 15
    An interesting compound variation on this is illustrated in [Illogical array access](http://stackoverflow.com/questions/8910837/why-does-this-work-illogical-array-access), where you have `char bar[]; int foo[];` and `foo[i][bar]` is used as an expression. – Jonathan Leffler Oct 17 '12 at 06:38
  • 7
    @EldritchConundrum, why do you think 'the compiler cannot check that the left part is a pointer'? Yes, it can. It's true that `a[b]` = `*(a + b)` for any given `a` and `b`, but it was the language designers' free choice for `+` to be defined commutative for all types. Nothing could prevent them from forbidding `i + p` while allowing `p + i`. – ach Mar 14 '14 at 19:46
  • 1
    @Andrey They could have forbidden `i+p`, but breaking commutativity hurts intuition. Forbidding `i[p]` would have made more sense, because brackets visually suggest accessing an array. – Eldritch Conundrum Mar 17 '14 at 13:11
  • 1
    @EldritchConundrum, to me, it is commutativity in this case that hurts intuition. With pointers, the `+` operator means offset, not addition; its arguments are of different nature and therefore there is no symmetry in them. We cannot write `i - p`, can we? – ach Mar 17 '14 at 14:58
  • 15
    @Andrey One usually expects `+` to be commutative, so maybe the real problem is choosing to make pointer operations resemble arithmetic, instead of designing a separate offset operator. – Eldritch Conundrum Mar 18 '14 at 10:36
  • 4
    @ach Re "We cannot write i - p": Are you suggesting that subtraction is normally commutative? ;-) – Peter - Reinstate Monica Oct 14 '17 at 20:15
  • Not only is `a[5] == 5[a]`, but even `&a[5] == &5[a]`, i.e. the two don't just have the same value, they are the very same object. – Peter - Reinstate Monica Oct 14 '17 at 20:17
  • 1
    @Peter, you're missing my point. It is not operation signs that are commutative, but operations denoted by them. Using `+` to denote offset is Ok in itself but offset, unlike addition, is not commutative. You can apply an offset of 7 steps northward to an old oak to find a treasure but you cannot apply an old oak to 7 steps northward. – ach Oct 14 '17 at 20:56
  • 1
    @ach of course you can; it's a simple vector addition in nature (you can walk the vector to the tree first, and then the offset, or first the offset, and then the same vector; it's completely commutative), in math, and in programming (if we consider the address space a one-dimensional vector). Subtraction, obviously, is not: Not in nature, not in math, and not in programming. Neither circumstance is surprising. – Peter - Reinstate Monica Oct 15 '17 at 07:08
  • 3
    Note: It isn't always fruitful to try to figure out why C does things a certain way unless you remember/consider it's history. C was made to port Unix, Unix was made to run C--this helped spread Unix to many platforms. So the language was mostly designed around making an easy-to-implement/port compiler. These days most language syntax is designed with different goals such as readability and consistency or speed of implementation or reduction of bugs or all of the above) and so you wouldn't find features like this making much sense. – Bill K Nov 27 '17 at 16:51

20 Answers20

2120

The C standard defines the [] operator as follows:

a[b] == *(a + b)

Therefore a[5] will evaluate to:

*(a + 5)

and 5[a] will evaluate to:

*(5 + a)

a is a pointer to the first element of the array. a[5] is the value that's 5 elements further from a, which is the same as *(a + 5), and from elementary school math we know those are equal (addition is commutative).

Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
Mehrdad Afshari
  • 414,610
  • 91
  • 852
  • 789
  • 362
    I wonder if it isn't more like *((5 * sizeof(a)) + a). Great explaination though. – John MacIntyre Dec 19 '08 at 17:06
  • 2
    Why is sizeof() taken into account. I thought the pointer to 'a' is to the beginning of the array (ie: the 0 element). If this is true, you only need *(a + 5). My understanding must be incorrect. What's the correct reason? – Dinah Dec 19 '08 at 17:15
  • 4
    If you have an array of 4 byte integers, a[1] - a[0] = 4 (4 bytes dieffernce between the two pointers). – Treb Dec 19 '08 at 17:17
  • 116
    @Dinah: From a C-compiler perspective, you are right. No sizeof is needed and those expressions I mentioned are THE SAME. However, the compiler will take sizeof into account when producing machine code. If a is an int array, `a[5]` will compile to something like `mov eax, [ebx+20]` instead of `[ebx+5]` – Mehrdad Afshari Dec 19 '08 at 17:18
  • 14
    @Dinah: A is an address, say 0x1230. If a was in 32-bit int array, then a[0] is at 0x1230, a[1] is at 0x1234, a[2] at 0x1238...a[5] at x1244 etc. If we just add 5 to 0x1230, we get 0x1235, which is wrong. – James Curran Dec 19 '08 at 17:21
  • 2
    @James: bingo. That's what I needed to see. I kept seeing sizeof() and thinking count() and getting mightily confused. Not my brightest moment. Thank you! – Dinah Dec 19 '08 at 17:27
  • @Dinah; the assignment operator comment was just a tongue-in-cheek comment about how anal I am. ;-) ... I knew what you meant, and I'm sure everybody else did as well. Great question btw, I was just listening to the SO podcast where they were talking about it. – John MacIntyre Dec 19 '08 at 18:25
  • 9
    So in the 5[a] case, the compiler is smart enough to use "*((5 * sizeof(a)) + a)" and not "*(5 + (a * sizeof(5)))"? Note: I guess so. I tried this in GCC and it worked. – Harvey Dec 22 '08 at 18:27
  • 46
    @sr105: That's a special case for the + operator, where one of the operands is a pointer and the other an integer. The standard says that the result will be of the type of the pointer. The compiler /has to be/ smart enough. – aib Dec 23 '08 at 02:08
  • 8
    When you add an integer to a pointer, the compiler knows what type the pointer points to (so if a is an int*, it's 4 bytes or whatever...) so can perform the arithmetic right. Basically if you do "p++" then p should be adjusted to point to the next object in memory. "p++" is basically equivalent to "p = p + 1", so the definition of pointer addition makes everything line up. Also note you can't do arithmetic with pointers of type `void*`. – araqnid Apr 18 '09 at 01:03
  • 1
    @litb: I understand your concern and potentially "misleading" people. However, I wanted to keep simplicity of the answer, as in this context, the array decays to a pointer. I changed "being a pointer" to "behaving as pointers." I hope that's OK. Thanks for the comment, btw. – Mehrdad Afshari Sep 21 '09 at 16:25
  • http://freeworld.thc.org/root/phun/unmaintain.html mentions this as a good tactic for obfuscation, giving the example `myfunc(6291, 8)[Array];` where `myfunc` is simply the modulo function (that's equivalent to `Array[3]`) – Fahad Sadah May 23 '10 at 12:16
  • @Mehrdad I think the main reason behind this post getting upvoted more than that exploit post (which definitely deserves to be on the top) is that this one addresses a relatively simpler problem and hence more people tend to understand this. The anatomy of the exploit is not this simple and most people will just skip it :) – Amarghosh Nov 12 '10 at 05:33
  • 61
    "from elementary school math we know those are equal" - I understand that you are simplifying, but I'm with those who feel like this is *over*simplifying. It's not elementary that `*(10 + (int *)13) != *((int *)10 + 13)`. In other words, there's more going on here than elementary school arithmetic. The commutativity relies critically on the compiler recognizing which operand is a pointer (and to what size of object). To put it another way, `(1 apple + 2 oranges) = (2 oranges + 1 apple)`, but `(1 apple + 2 oranges) != (1 orange + 2 apples)`. – LarsH Dec 01 '10 at 20:54
  • 8
    @LarsH: You're right. I'd say it's more analogous to `(10in + 10cm)` rather than apples and oranges (you can meaningfully convert one to another). – Mehrdad Afshari Dec 01 '10 at 21:53
  • 8
    @Mehrdad: Fair enough. Maybe a better analogy is a date vs. a time interval, as in `(May 1st 2010 + 3 weeks)`. – LarsH Dec 01 '10 at 23:37
  • "This is the direct artifact of arrays behaving as pointers": no, arrays do not behave as pointers at all. – Lightness Races in Orbit Aug 14 '11 at 15:14
  • 2
    '"a" is a memory address': no, no more than `x` is a memory address if you write `int x;`. The name of the array can _decay_ to a pointer to the first element of that array, though. – Lightness Races in Orbit Aug 14 '11 at 15:14
  • 2
    @Tomalak I understand. There are plenty of places that it was relevant and we've discussed it. However, while the question specifically asks about the *reason* why it works the way it does. I can't imagine this being the behavior of `5[a]` if in the original implementation of C, pointers weren't really binaries representing memory addresses directly understandable by the CPU. If we want to be too pedantic, the answer (to this question and many more) is: "Because the standard defines the behavior of `[]` operator on `int` types on one side and array or pointer types on another as such." – Mehrdad Afshari Aug 14 '11 at 22:21
  • 1
    @Jim: No, it's because the *types*, not the values, are the same. Furthermore, elementary school arithmetic cannot be applied blindly to arithmetic operators. Consider `INT_MAX - 5 + 1` vs `INT_MAX + 1 - 5`. – Ben Voigt Apr 05 '13 at 17:27
  • @Jim: Hardly. The type of `a` and the type of `99` are certainly not the same in this question. – Ben Voigt Apr 05 '13 at 21:44
  • @Jim: What is it called when you edit your comment in order to make my response look stupid? You just have to look up a few comments, to see that type DOES matter. `(10 + (int *)13) != ((int *)10 + 13)` and that was already pointed out. – Ben Voigt Apr 06 '13 at 00:38
  • 1
    Also, my claim that "elementary school arithmetic cannot be applied blindly to arithmetic operators" needs only one example to prove that further consideration, not blind application, is necessary. And I can provide several examples. Here's another case where type is important: `T a = 7.0; double x = a / 2.0;` Clearly whether `a` is `int` or `double` makes a huge difference in the answer. – Ben Voigt Apr 06 '13 at 00:41
  • More examples are possible, due to limited range and precision of floating-point types. The example I chose originally, I chose because it involves integer addition, same as the problem under discussion. – Ben Voigt Apr 06 '13 at 00:45
  • 2
    @BenVoigt Actually I think your example should be `double x = a / 2;`. If it's `2.0` the result will be `double`, regardless of whether `a` is an `int` or a `double`. – Bernhard Barker Jul 30 '13 at 10:36
  • 2
    What exactly in elementary school arithmetics says that adding values *of completely different types* must always be commutative? – hamstergene Jun 28 '14 at 20:36
  • @hamstergene Elementary school math does not talk about types. My answer to the OP question for you would be The One and Only True Answer: "because the C standard says so." – Mehrdad Afshari Jun 30 '14 at 01:53
  • 1
    @JohnMacIntyre Even if it isn't automatically incremented, shouldn't it be `*((5 * sizeof(*a)) + a)` instead of `*((5 * sizeof(a)) + a)`? – Bolun Zhang Jul 17 '14 at 17:35
  • *from elementary school math we know those are equal*, well it's true that we learn that addition is commutative, but in the case of values of the same type! So it is not obvious that adding a pointer and an integer is a commutative operation! But this is defined by the standard... This is no less obvious than adding 5 to an address does not give address+5, but address+5*sizeof(type)! So pointer arithmetic is not so obvious. – Jean-Baptiste Yunès Nov 18 '14 at 08:19
  • @Jean-BaptisteYunès Yes. The technical answer to the question is "because the language specification says `*(p+5)` is equal to `*(5+p)` and `a[b]` equals `*(a+b)`". However, the rationale for `*(p+5)` being equal to `*(5+p)` is indeed consistency with "elementary school math". – Mehrdad Afshari Nov 19 '14 at 03:09
  • 2
    For sure, but consistent with elementary math is not a requirement in pointer arithmetic. The sum is "typed" with the pointer's type so it is not so "natural", so why would you like it to be commutative ? Just because the code produced in assembly doesn't have type ? – Jean-Baptiste Yunès Nov 19 '14 at 06:58
  • @Jean-BaptisteYunès It's not a requirement. It is a design decision the C language designers made presumably to remain consistent with commutativity of the addition operator. Sure, nothing is _required_ in the strictest sense when you are designing a language. – Mehrdad Afshari Nov 19 '14 at 07:21
  • 1
    @Jean-BaptisteYunès & Mehrdad Afshari: May be it's worth mentioning that in assembly languages we sometimes use a constant base address of a table and a calculated offset to select an array's item, and sometimes we have a constant offset to a member of a dynamically allocated structure. And both types of access, const[var] and var[const] are translated to the same CPU instruction. Possibly C, as a quite low-level among high-level languages, deliberately inherits this equivalence. – CiaPan Apr 14 '16 at 09:28
  • 2
    A little history may help explain why this is the way that it is. As noted here: http://www.gotw.ca/conv/003.htm C and C++ have their origins in BCPL. BCPL used `!` (aka pling) as the indirection operator, and it took two forms, unary and binary. `!a` unary has the same meaning as `*a` does in C/C++, i.e. unary indirection. `a!b` binary is used for array lookup, equivalent to `a[b]` in C. Since binary `!` is commutative in BCPL, and has the same effect as `!(a + b)` I very strongly suspect this is why array indirection has the same commutative behavior in C/C++. – dgnuff Apr 18 '18 at 23:14
  • Why is it syntactically allowed to index integer literals by the standard? I cannot see how anyone would write this intentionally. The standard probably allows it because adding a check will make a compiler parser/lexer slightly more complex. But I think in today's world the speed impact on compilation will be minimal, while catching unintentional behaviour is very useful. Newer versions of GCC even warn about fall-through in switches, which has an actual intentional use. So IMHO compilers should at least warn about this. GCC 8.2 does not give a warning even with `-Wall`. – Jan Christoph Terasa Nov 08 '18 at 06:00
  • @JanChristophTerasa Sometimes its not worth the extra steps required to restrict something artificially, just because you don't think someone should use it. It would take a lot of extra writing to take out a usless option. But maybe we can get a warning for the "goes to" operator, `while (0 <-- counter)` – Cort Ammon May 07 '20 at 01:59
  • 1
    @JohnMacIntyre Remeber that `*(a + b)` is the same as `*(b + a)`, so `*(5 + a)` is `*(a + 5)`. `a` being a pointer is subject to *pointer arithmetic* (otherwise the `*` dereference is invalid). In summary: `*(5 * sizeof(a)) + a)` *is wrong*. – U. Windl Nov 04 '20 at 12:59
305

Because array access is defined in terms of pointers. a[i] is defined to mean *(a + i), which is commutative.

David Thornley
  • 56,304
  • 9
  • 91
  • 158
  • 56
    Arrays are not defined in terms of pointers, but _access_ to them is. – Lightness Races in Orbit May 12 '11 at 23:20
  • 10
    I would add "so it is equal to `*(i + a)`, which can be written as `i[a]`". – Jim Balter Apr 05 '13 at 22:11
  • 4
    I would suggest you include the quote from the standard, which is as follows: 6.5.2.1: 2 A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero). – Vality Feb 17 '15 at 21:41
  • To be more correct: Arrays decay into pointers when you access them. – 12431234123412341234123 May 14 '18 at 16:11
  • 2
    Nitpick: It doesn't make sense to say that "`*(a + i)` is commutative". However, `*(a + i) = *(i + a) = i[a]` because *addition* is commutative. – Andreas Rejbrand Oct 13 '19 at 22:18
  • 1
    @AndreasRejbrand OTOH `+` is the only binary operator in the expression, so it's rather clear what can be commutative at all. – U. Windl Nov 04 '20 at 13:03
  • It is not a given but it is an explicit language choice here that `+` is commutative even with a pointer and an offset. It's not at all clear in general that `a+b` is commutative if a and b are different types: When you overload the operator in C++, it's the programmer's choice. A lot could be said for making `operator+(int, float)` return an int, for example, and to forbid `operator(int, int *)`. (Of course a lot can be said, on the other hand, for sticking with the abstract semantics, including commutativity; just sayin'.) – Peter - Reinstate Monica Sep 29 '22 at 14:46
285

I think something is being missed by the other answers.

Yes, p[i] is by definition equivalent to *(p+i), which (because addition is commutative) is equivalent to *(i+p), which (again, by the definition of the [] operator) is equivalent to i[p].

(And in array[i], the array name is implicitly converted to a pointer to the array's first element.)

But the commutativity of addition is not all that obvious in this case.

When both operands are of the same type, or even of different numeric types that are promoted to a common type, commutativity makes perfect sense: x + y == y + x.

But in this case we're talking specifically about pointer arithmetic, where one operand is a pointer and the other is an integer. (Integer + integer is a different operation, and pointer + pointer is nonsense.)

The C standard's description of the + operator (N1570 6.5.6) says:

For addition, either both operands shall have arithmetic type, or one operand shall be a pointer to a complete object type and the other shall have integer type.

It could just as easily have said:

For addition, either both operands shall have arithmetic type, or the left operand shall be a pointer to a complete object type and the right operand shall have integer type.

in which case both i + p and i[p] would be illegal.

In C++ terms, we really have two sets of overloaded + operators, which can be loosely described as:

pointer operator+(pointer p, integer i);

and

pointer operator+(integer i, pointer p);

of which only the first is really necessary.

So why is it this way?

C++ inherited this definition from C, which got it from B (the commutativity of array indexing is explicitly mentioned in the 1972 Users' Reference to B), which got it from BCPL (manual dated 1967), which may well have gotten it from even earlier languages (CPL? Algol?).

So the idea that array indexing is defined in terms of addition, and that addition, even of a pointer and an integer, is commutative, goes back many decades, to C's ancestor languages.

Those languages were much less strongly typed than modern C is. In particular, the distinction between pointers and integers was often ignored. (Early C programmers sometimes used pointers as unsigned integers, before the unsigned keyword was added to the language.) So the idea of making addition non-commutative because the operands are of different types probably wouldn't have occurred to the designers of those languages. If a user wanted to add two "things", whether those "things" are integers, pointers, or something else, it wasn't up to the language to prevent it.

And over the years, any change to that rule would have broken existing code (though the 1989 ANSI C standard might have been a good opportunity).

Changing C and/or C++ to require putting the pointer on the left and the integer on the right might break some existing code, but there would be no loss of real expressive power.

So now we have arr[3] and 3[arr] meaning exactly the same thing, though the latter form should never appear outside the IOCCC.

Orace
  • 7,822
  • 30
  • 45
Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • 16
    Fantastic description of this property. From a high level view, I think `3[arr]` is an interesting artifact but should rarely if ever be used. The accepted answer to this question () which I asked a while back has changed the way I've thought about syntax. Although there's often technically not a right and wrong way to do these things, these kinds of features start you thinking in a way which is separate from the implementation details. There's benefit to this different way of thinking which is in part lost when you fixate on the implementation details. – Dinah Aug 24 '13 at 01:01
  • 3
    Addition is commutative. For the C standard to define it otherwise would be strange. That's why it could not just as easily said "For addition, either both operands shall have arithmetic type, or the left operand shall be a pointer to a complete object type and the right operand shall have integer type." - That wouldn't make sense to most people who add things. – iheanyi Apr 21 '14 at 17:54
  • 13
    @iheanyi: Addition is usually commutative -- and it usually takes two operands of the same type. Pointer addition lets you add a pointer and an integer, but not two pointers. IMHO that's already a sufficiently odd special case that requiring the pointer to be the left operand wouldn't be a significant burden. (Some languages use "+" for string concatenation; that's certainly not commutative.) – Keith Thompson Apr 21 '14 at 18:13
  • True on the string example! In that light, this looks like a language decision that comes from the implementation side of things - rather than design. – iheanyi Apr 21 '14 at 19:53
  • 1
    @iheanyi: Addition of numbers is commutative, but that doesn't mean that addition must be commutative with things that are not numbers. It was not uncommon for assemblers to require that every address involving a relocatable symbol must be of the exact form "rel_symbol", "rel_symbol + number", or "rel_symbol - number", since the linker would expect a list of fix-ups, each of which identified a "base" symbol and the place where it was used (the pre-fixed-up code would hold the number to be added to the symbol). – supercat Oct 20 '14 at 16:08
  • @iheanyi: I think it's cleaner from a rules perspective to say that the second operand of an addition operator must be a number, and the result type will match the first operand, than to try to say that "at least one" operand must be a number. Incidentally, a lot of annoyances related to unsigned types could have been eliminated if the addition operator always returned the type of its left-hand operand, rather than saying that given `uint32_t x=0;` the value of `x-1` must on some implementations yield 4294967295 and on others yield -1. – supercat Oct 20 '14 at 16:18
  • 3
    @supercat, That's even worse. That would mean that sometimes x + 1 != 1 + x. That would completely violate the associative property of addition. – iheanyi Oct 21 '14 at 16:34
  • 4
    @iheanyi: I think you meant commutative property; addition is already not associative, since on most implementations (1LL+1U)-2 != 1LL+(1U-2). Indeed, the change would make some situations associative which presently aren't, e.g. 3U+(UINT_MAX-2L) would equal (3U+UINT_MAX)-2. What would be best, though, is for the language to have add new distinct types for promotable integers and "wrapping" algebraic rings, so that adding 2 to a `ring16_t` which holds 65535 would yield a `ring16_t` with value 1, *independent of the size of `int`*. – supercat Oct 21 '14 at 16:46
  • @supercat - thanks for that response. That clarifies the issues at hand with a good example :) – iheanyi Oct 21 '14 at 16:59
  • 1
    Concerning C++, it should be mentioned that user-defined operator overloads aren't subject to the same rule: `vec[5]` is fine, whereas `5[vec]` is an error. – L. F. May 01 '19 at 14:04
  • @L.F.: I *think* it's possible to provide an overload such that `5[vec]` is valid (and might have a different meaning than `vec[5]`. (I'll have to check that.) But the question is tagged "c", so I didn't go into that. – Keith Thompson May 01 '19 at 17:58
  • @KeithThompson Well, theoretically you can provide an implicit conversion to `T*`, but that's against the idea of vectors. – L. F. May 01 '19 at 23:31
  • 2
    index[array] is actually very useful to avoid nesting brackets when indices are used in place of pointers. `head->next->prev` becomes `array[array[head].next].prev` which is better written `head[array].next[array].prev`. I expanded on this in another answer. It would be a shame for C to lose this feature. I'd rather go the other way and make function calls commutable as well so we could chain g(f(x)) into x(f)(g). – Samuel Danielson Sep 21 '21 at 21:58
212

And, of course

 ("ABCD"[2] == 2["ABCD"]) && (2["ABCD"] == 'C') && ("ABCD"[2] == 'C')

The main reason for this was that back in the 70's when C was designed, computers didn't have much memory (64KB was a lot), so the C compiler didn't do much syntax checking. Hence "X[Y]" was rather blindly translated into "*(X+Y)"

This also explains the "+=" and "++" syntaxes. Everything in the form "A = B + C" had the same compiled form. But, if B was the same object as A, then an assembly level optimization was available. But the compiler wasn't bright enough to recognize it, so the developer had to (A += C). Similarly, if C was 1, a different assembly level optimization was available, and again the developer had to make it explicit, because the compiler didn't recognize it. (More recently compilers do, so those syntaxes are largely unnecessary these days)

franji1
  • 3,088
  • 2
  • 23
  • 43
James Curran
  • 101,701
  • 37
  • 181
  • 258
  • 21
    Isn't this a myth? I mean that the += and ++ operators were created to simplify for the compiler? Some code gets clearer with them, and it is useful syntax to have, no matter what the compiler does with it. – Thomas Padron-McCarthy Dec 19 '08 at 17:44
  • 7
    += and ++ has another significant benefit. if the left hand side changes some variable while evaluated, the change will only done once. a = a + ...; will do it twice. – Johannes Schaub - litb Dec 19 '08 at 17:49
  • 3
    Heard that += reduces the odds for mistakes as you write variable names two times rather than three... – Liran Orevi Apr 21 '09 at 08:02
  • 1
    a = a + with objects often leads to unoptimized copies of the objects, because it has to make a copy of a. a += does not need a copy, it is evaluated directly. – jkeys Aug 12 '09 at 21:49
  • doesn’t "ABCD"[2] resolve to "CD"? if you want it to resolve to 'C' you’d have to use dereferencing, i.e. `*("ABCD"[2]) == 'C')` – knittl Sep 21 '09 at 10:05
  • 10
    No - "ABCD"[2] == *("ABCD" + 2) = *("CD") = 'C'. Dereferencing a string gives you a char, not a substring – MSalters Sep 21 '09 at 10:34
  • 4
    "It'll be easier to implement this way" makes a whole lot more sense then "mathematically it works, so even though it serves no practical purpose whatsoever, lets add it to the language" as a rational. – Dennis Zickefoose Jun 19 '11 at 09:44
  • Algol68, as far as I recall, was the origin of combined arithmetic-and-assignation operators, as in `foo +:= bar`, pronounced 'foo plus-and-becomes bar'. I believe the rationale was that this more closely resembled what one wanted to do in the first place, namely 'add bar to foo' (though why we didn't get `bar =:+ foo` out of that logic, I don't know). – dave May 03 '12 at 02:26
  • 7
    @ThomasPadron-McCarthy: From [here](http://cm.bell-labs.com/cm/cs/who/dmr/chist.html): "During development, [Thompson] continually struggled against memory limitations: each language addition inflated the compiler so it could barely fit, but each rewrite taking advantage of the feature reduced its size. For example, B introduced generalized assignment operators, using x=+y to add y to x...Thompson went a step further by inventing the ++ and -- operators...a stronger motivation for the innovation was probably his observation that the translation of ++x was smaller than that of x=x+1." – John Bode May 03 '12 at 15:19
  • 3
    @dave: It's `x += 5;` rather than `x =+ 5;` because the latter would be parsed as `x = (+5);` – James Curran Jan 24 '13 at 14:14
  • 6
    @JamesCurran I am pretty sure it started out as `LHS =- RHS;` and was eventually swapped to use `-=`. – Vatine Apr 18 '13 at 15:58
  • 1
    ++ frequently mapped to a single machine instruction while x = x + 1 could be more than one. x += 3 maps to less machine instructions that x = x + 3 as the knowledge is that one will pick up x once, add three to it and drop it back down. register int x = 3 is from that same era, when compilers weren't as smart as they are today. – EvilTeach Oct 07 '13 at 02:30
  • @JamesCurran the unary `+` didn't exist in early C. – Miles Rout Jun 17 '14 at 15:57
  • 1
    @MilesRout : Perhaps not, but unary minus definitely did, leading to the same problem. – James Curran Jul 01 '14 at 19:22
  • 1
    The PDP11 mini computer (PDP were used for the first C and UNIX operating system) had assembly instructions for += -= ++ -- so while there may have been forerunners in Algol, there were a bit of 1-to-1 mapping between instruction set and language capabilities. – Soren Aug 27 '14 at 23:33
  • 2
    @Vatine is right, it was `=+` before `+=`. The B programming language (which I'm surprised to read is still used), ancestor of C, uses the `=+` form. IIRC, the main reason for changing it was that `i=-1;` was ambiguous. Not ambiguous to the compiler, but to human readers who had trouble understanding whether this was supposed to decrease `i` by 1 (and hence correctly written), or whether this was supposed to assign `-1` to `i` (and hence a bug in the code). Disclaimer: my recollection may be faulty. –  Nov 15 '14 at 12:33
  • @JohnBode The quoted sentence beginning 'a stronger motivation for the innovation ...' is just circular reasoning. He couldn't have noticed it before he innovated it. The fact is that the PDP-11 had both pre-increment and post-decrement instructions, or possibly the other way around, it's been 37 years. – user207421 Jan 25 '16 at 22:47
59

One thing no-one seems to have mentioned about Dinah's problem with sizeof:

You can only add an integer to a pointer, you can't add two pointers together. That way when adding a pointer to an integer, or an integer to a pointer, the compiler always knows which bit has a size that needs to be taken into account.

Dinah
  • 52,922
  • 30
  • 133
  • 149
user30364
  • 624
  • 4
  • 2
  • 2
    There's a fairly exhaustive conversation about this in the comments of the accepted answer. I referenced said conversation in the edit to the original question but did not directly address your very valid concern of sizeof. Not sure how to best do this in SO. Should I make another edit to the orig. question? – Dinah Apr 21 '09 at 13:51
  • I'd like to note that you cannot *add* pointers, but you can *subtract* pointers (returning the number of items between). – U. Windl Nov 04 '20 at 13:10
53

To answer the question literally. It is not always true that x == x

double zero = 0.0;
double a[] = { 0,0,0,0,0, zero/zero}; // NaN
cout << (a[5] == 5[a] ? "true" : "false") << endl;

prints

false
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • 34
    Actually a "nan" is not equal to itself: `cout << (a[5] == a[5] ? "true" : "false") << endl;` is `false`. – TrueY Apr 23 '13 at 09:34
  • 11
    @TrueY: He did state that specifically for the NaN case (and specifically that `x == x` is not always true). I think that was his intention. So he is *technically* correct (and possibly, as they say, the best kind of correct!). – Tim Čas Feb 13 '15 at 01:04
  • 7
    The question is about C, your code is not C code. There is also a `NAN` in ``, which is better than `0.0/0.0`, because `0.0/0.0` is UB when `__STDC_IEC_559__` is not defined (Most implementations do not define `__STDC_IEC_559__`, but on most implementations `0.0/0.0` will still work) – 12431234123412341234123 May 14 '18 at 16:02
37

I just find out this ugly syntax could be "useful", or at least very fun to play with when you want to deal with an array of indexes which refer to positions into the same array. It can replace nested square brackets and make the code more readable !

int a[] = { 2 , 3 , 3 , 2 , 4 };
int s = sizeof a / sizeof *a;  //  s == 5

for(int i = 0 ; i < s ; ++i) {  
           
    cout << a[a[a[i]]] << endl;
    // ... is equivalent to ...
    cout << i[a][a][a] << endl;  // but I prefer this one, it's easier to increase the level of indirection (without loop)
    
}

Of course, I'm quite sure that there is no use case for that in real code, but I found it interesting anyway :)

Good Night Nerd Pride
  • 8,245
  • 4
  • 49
  • 65
Frédéric Terrazzoni
  • 2,190
  • 19
  • 25
30

Nice question/answers.

Just want to point out that C pointers and arrays are not the same, although in this case the difference is not essential.

Consider the following declarations:

int a[10];
int* p = a;

In a.out, the symbol a is at an address that's the beginning of the array, and symbol p is at an address where a pointer is stored, and the value of the pointer at that memory location is the beginning of the array.

NAND
  • 663
  • 8
  • 22
PolyThinker
  • 5,152
  • 21
  • 22
  • 2
    No, technically they are not the same. If you define some b as int*const and make it point to an array, it is still a pointer, meaning that in the symbol table, b refers to a memory location that stores an address, which in turn points to where the array is. – PolyThinker Dec 22 '08 at 05:42
  • 4
    Very good point. I remember having a very nasty bug when I defined a global symbol as char s[100] in one module, declare it as extern char *s; in another module. After linking it all together the program behaved very strangely. Because the module using the extern declaration was using the initial bytes of the array as a pointer to char. – Giorgio May 02 '12 at 18:15
  • 1
    Originally, in C's grandparent BCPL, an array was a pointer. That is, what you got when you wrote (I have transliterated to C) `int a[10]` was a pointer called 'a', which pointed to enough store for 10 integers, elsewhere. Thus a+i and j+i had the same form: add the contents of a couple of memory locations. In fact, I think BCPL was typeless, so they were identical. And the sizeof-type scaling did not apply, since BCPL was purely word-oriented (on word-addressed machines also). – dave May 03 '12 at 02:33
  • I think the best way to understand the difference is to compare `int*p = a;` to `int b = 5;` In the latter, "b" and "5" are both integers, but "b" is a variable, while "5" is a fixed value. Similarly, "p" & "a" are both addresses of a character, but "a" is a fixed value. – James Curran Mar 12 '13 at 16:34
  • 1
    While this "answer" does not answer the question (and thus should be a comment, not an answer), you could summarize as "an array is not an lvalue, but a pointer is". – U. Windl Nov 04 '20 at 13:15
23

For pointers in C, we have

a[5] == *(a + 5)

and also

5[a] == *(5 + a)

Hence it is true that a[5] == 5[a].

user1055604
  • 1,624
  • 11
  • 28
18

Not an answer, but just some food for thought. If class is having overloaded index/subscript operator, the expression 0[x] will not work:

class Sub
{
public:
    int operator [](size_t nIndex)
    {
        return 0;
    }   
};

int main()
{
    Sub s;
    s[0];
    0[s]; // ERROR 
}

Since we dont have access to int class, this cannot be done:

class int
{
   int operator[](const Sub&);
};
Ajay
  • 18,086
  • 12
  • 59
  • 105
  • 3
    `class Sub { public: int operator[](size_t nIndex) const { return 0; } friend int operator[](size_t nIndex, const Sub& This) { return 0; } };` – Ben Voigt Apr 05 '13 at 17:23
  • 1
    Have you actually tried compiling it? There are set of operators that cannot be implemented outside class (i.e. as non-static functions)! – Ajay Apr 05 '13 at 21:10
  • 4
    oops, you're right. "`operator[]` shall be a non-static member function with exactly one parameter." I was familiar with that restriction on `operator=`, didn't think it applied to `[]`. – Ben Voigt Apr 05 '13 at 21:21
  • 2
    Of course, if you change the definition of `[]` operator, it would never be equivalent again... if `a[b]` is equal to `*(a + b)` and you change this, you'll have to overload also `int::operator[](const Sub&);` and `int` is not a class... – Luis Colorado Sep 19 '14 at 13:18
  • 13
    This...isn't...C. – MD XF Dec 13 '16 at 07:13
14

It has very good explanation in A TUTORIAL ON POINTERS AND ARRAYS IN C by Ted Jensen.

Ted Jensen explained it as:

In fact, this is true, i.e wherever one writes a[i] it can be replaced with *(a + i) without any problems. In fact, the compiler will create the same code in either case. Thus we see that pointer arithmetic is the same thing as array indexing. Either syntax produces the same result.

This is NOT saying that pointers and arrays are the same thing, they are not. We are only saying that to identify a given element of an array we have the choice of two syntaxes, one using array indexing and the other using pointer arithmetic, which yield identical results.

Now, looking at this last expression, part of it.. (a + i), is a simple addition using the + operator and the rules of C state that such an expression is commutative. That is (a + i) is identical to (i + a). Thus we could write *(i + a) just as easily as *(a + i). But *(i + a) could have come from i[a] ! From all of this comes the curious truth that if:

char a[20];

writing

a[3] = 'x';

is the same as writing

3[a] = 'x';
Right leg
  • 16,080
  • 7
  • 48
  • 81
A.s. Bhullar
  • 2,680
  • 2
  • 26
  • 32
  • 5
    a+i is NOT simple addition, because it's pointer arithmetic. if the size of the element of a is 1 (char), then yes, it's just like integer +. But if it's (e.g.) an integer, then it might be equivalent to a + 4*i. – Alex Brown Dec 04 '15 at 20:17
  • 1
    @AlexBrown Yes, it is pointer arithmetic, which is exactly why your last sentence is wrong, unless you first cast 'a' to be a (char*) (assuming that an int is 4 chars). I really don't understand why so many people are getting hung up on the actual value result of pointer arithmetic. Pointer arithmetic's entire purpose is to abstract away the underlying pointer values and let the programmer think about the objects being manipulated rather than address values. – jschultz410 Mar 21 '18 at 16:11
11

I know the question is answered, but I couldn't resist sharing this explanation.

I remember Principles of Compiler design, Let's assume a is an int array and size of int is 2 bytes, & Base address for a is 1000.

How a[5] will work ->

Base Address of your Array a + (5*size of(data type for array a))
i.e. 1000 + (5*2) = 1010

So,

Similarly when the c code is broken down into 3-address code, 5[a] will become ->

Base Address of your Array a + (size of(data type for array a)*5)
i.e. 1000 + (2*5) = 1010 

So basically both the statements are pointing to the same location in memory and hence, a[5] = 5[a].

This explanation is also the reason why negative indexes in arrays work in C.

i.e. if I access a[-5] it will give me

Base Address of your Array a + (-5 * size of(data type for array a))
i.e. 1000 + (-5*2) = 990

It will return me object at location 990.

Ajinkya Patil
  • 741
  • 1
  • 6
  • 17
10

A little bit of history now. Among other languages, BCPL had a fairly major influence on C's early development. If you declared an array in BCPL with something like:

let V = vec 10

that actually allocated 11 words of memory, not 10. Typically V was the first, and contained the address of the immediately following word. So unlike C, naming V went to that location and picked up the address of the zeroeth element of the array. Therefore array indirection in BCPL, expressed as

let J = V!5

really did have to do J = !(V + 5) (using BCPL syntax) since it was necessary to fetch V to get the base address of the array. Thus V!5 and 5!V were synonymous. As an anecdotal observation, WAFL (Warwick Functional Language) was written in BCPL, and to the best of my memory tended to use the latter syntax rather than the former for accessing the nodes used as data storage. Granted this is from somewhere between 35 and 40 years ago, so my memory is a little rusty. :)

The innovation of dispensing with the extra word of storage and having the compiler insert the base address of the array when it was named came later. According to the C history paper this happened at about the time structures were added to C.

Note that ! in BCPL was both a unary prefix operator and a binary infix operator, in both cases doing indirection. just that the binary form included an addition of the two operands before doing the indirection. Given the word oriented nature of BCPL (and B) this actually made a lot of sense. The restriction of "pointer and integer" was made necessary in C when it gained data types, and sizeof became a thing.

dgnuff
  • 3,195
  • 2
  • 18
  • 32
8

In C arrays, arr[3] and 3[arr] are the same, and their equivalent pointer notations are *(arr + 3) to *(3 + arr). But on the contrary [arr]3 or [3]arr is not correct and will result into syntax error, as (arr + 3)* and (3 + arr)* are not valid expressions. The reason is dereference operator should be placed before the address yielded by the expression, not after the address.

Machavity
  • 30,841
  • 27
  • 92
  • 100
Krishan
  • 317
  • 4
  • 8
8

in c compiler

a[i]
i[a]
*(a+i)

are different ways to refer to an element in an array ! (NOT AT ALL WEIRD)

AVIK DUTTA
  • 736
  • 6
  • 23
5

C was based on BCPL. BCPL directly exposed memory as a sequence of addressable words. The unary operator !X (also known as LV) gave you the contents of the address location X. For convenience there was also a binary operator X!Y equivalent to !(X+Y) which gave you the contents of the Y'th word of an array at location X, or equivalently, the X'th word of an array at location Y.

In C, X!Y became X[Y], but the original BCPL semantics of !(X+Y) show through, which accounts for why the operator is commutative.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
3

Well, this is a feature that is only possible because of the language support.

The compiler interprets a[i] as *(a+i) and the expression 5[a] evaluates to *(5+a). Since addition is commutative it turns out that both are equal. Hence the expression evaluates to true.

Harsha J K
  • 197
  • 1
  • 1
  • 17
3

Because C compiler always convert array notation in pointer notation. a[5] = *(a + 5) also 5[a] = *(5 + a) = *(a + 5) So, both are equal.

3

Because it's useful to avoid confusing nesting.

Would you rather read this:

array[array[head].next].prev

or this:

head[array].next[array].prev

Incidentally, C++ has a similar commutative property for function calls. Rather than writing g(f(x)) as you must in C, you may use member functions to write x.f().g(). Replace f and g with lookup tables and you can write g[f[x]] (functional style) or (x[f])[g] (oop style). The latter gets really nice with structs containing indices: x[xs].y[ys].z[zs]. Using the more common notation that's zs[ys[xs[x].y].z].

Samuel Danielson
  • 5,231
  • 3
  • 35
  • 37
  • I've probably been doing too much reading in FP, but the second one seems read more nicely to me: "head of array", "next of array". Of course this depends upon heavy editorial license in the reading. – luser droog Mar 11 '23 at 15:21
2

In C

 int a[]={10,20,30,40,50};
 int *p=a;
 printf("%d\n",*p++);//output will be 10
 printf("%d\n",*a++);//will give an error

Pointer p is a "variable", array name a is a "mnemonic" or "synonym", so p++ is valid but a++ is invalid.

a[2] is equals to 2[a] because the internal operation on both of this is "Pointer Arithmetic" internally calculated as *(a+2) equals *(2+a)

U. Windl
  • 3,480
  • 26
  • 54
JgWangdu
  • 319
  • 2
  • 9