Why does the indexing start with zero in 'C'?

Question

Why does the indexing in an array start with zero in C and not with 1?

possible duplicate of [Defend zero-based arrays](http://stackoverflow.com/questions/393462/defend-zero-based-arrays) — dmckee --- ex-moderator kitten, Sep 06 '11 at 18:28
A pointer (array) is a memory direction and index is an offset of that memory direction, so the first element of the pointer (array) is the one who offset is equal to 0. — D33pN16h7, Sep 07 '11 at 02:47
@drhirsch because when we count a set of objects, we begin by pointing at an object and saying "one". — phoog, Feb 21 '12 at 15:53
Americans count the floors (storeys) of a building from one on the ground floor; the British count from zero (ground floor), moving up to the first floor, then the second floor, etc. — Jonathan Leffler, Aug 05 '15 at 03:07
Think of it as an offset, not an index, and you'll understand. — axiac, Aug 10 '16 at 14:17
@JonathanLeffler I think we are just not used to count from 0 since we'd been told to count from 1 in decimal notation since the beginning of our lives. But to view it from a mathematical and rational point of view, it is logical to count from 0 since it is the first positive/non-negative integer, 1 is just the second. — RobertS supports Monica Cellio, Jul 01 '20 at 14:20

score 133 · Answer 1 · edited Jul 02 '20 at 18:40

133

In C, the name of an array is essentially a pointer [but see the comments], a reference to a memory location, and so the expression array[n] refers to a memory location n elements away from the starting element. This means that the index is used as an offset. The first element of the array is exactly contained in the memory location that array refers (0 elements away), so it should be denoted as array[0].

For more info:

http://developeronline.blogspot.com/2008/04/why-array-index-should-start-from-0.html

edited Jul 02 '20 at 18:40

Keith Thompson

254,901
44
429
631

answered Sep 06 '11 at 13:29

Massimiliano Peluso

26,379
6
61
70

22

The name of an array is the name of the array; contrary to the common misconception, arrays are *not* pointers in any sense. An array expression (such as the name of an array object) is usually, *but not always*, converted to a pointer to the first element. Example: `sizeof arr` yields the size of the array object, not the size of a pointer. – Keith Thompson Sep 21 '11 at 07:44
1

While you obviously didn't reacted to @KeithThompson's comment, I would like to you use a more offense course: "*In C, the name of an array is essentially a pointer, a reference to a memory location*" - No, it is **not**. At least not in a generic point of view. While your answer perfect answers the question in a manner how 0 as index start is important, the first sentence is plain incorrect. An array does not always decay to a pointer to its first element. – RobertS supports Monica Cellio Jul 01 '20 at 14:14
Quote from the C standard, (C18), 6.3.2.1/4: "*Except when it is the operand of the `sizeof` operator, or the unary `&` operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.*" – RobertS supports Monica Cellio Jul 01 '20 at 14:14
Also this decay is happen in a more "implicit" or "formally" way than suggested here; there is no decay to a pointer object in memory involved. This is object of this question: [Is the array to pointer decay changed to a pointer object?](https://stackoverflow.com/q/62345429/12139179) - Please edit your answer to be fully correct. – RobertS supports Monica Cellio Jul 01 '20 at 14:15

score 112 · Answer 2 · edited Apr 21 '22 at 14:00

This question was posted over a year ago, but here goes...

About the above reasons

While Dijkstra's article (previously referenced in a now-deleted answer) makes sense from a mathematical perspective, it isn't as relevant when it comes to programming.

The decision taken by the language specification & compiler-designers is based on the decision made by computer system-designers to start count at 0.

The probable reason

Quoting from a Plea for Peace by Danny Cohen.

For any base b, the first b^N non-negative integers are represented by exactly N digits (including leading zeros) only if numbering starts at 0.

This can be tested quite easily. In base-2, take 2^3 = 8 The 8th number is:

8 (binary: 1000) if we start count at 1
7 (binary: 111) if we start count at 0

111 can be represented using 3 bits, while 1000 will require an extra bit (4 bits).

Why is this relevant

Computer memory addresses have 2^N cells addressed by N bits. Now if we start counting at 1, 2^N cells would need N+1 address lines. The extra-bit is needed to access exactly 1 address. (1000 in the above case.). Another way to solve it would be to leave the last address inaccessible, and use N address lines.

Both are sub-optimal solutions, compared to starting count at 0, which would keep all addresses accessible, using exactly N address lines!

Conclusion

The decision to start count at 0, has since permeated all digital systems, including the software running on them, because it makes it simpler for the code to translate to what the underlying system can interpret. If it weren't so, there would be one unnecessary translation operation between the machine and programmer, for every array access. It makes compilation easier.

Quoting from the paper:

Who's on first? Zero or one?

People start counting from the number one. The very word first is abbreviated as 1st, which indicates one. This, however, is a very modern notation. The older concepts do not necessarily support this relationship.
In English and French the word first is not derived from the word one, but from an old word for prince, which means foremost. Similarly, The English word second is not derived from the number two but from an old word which means "to follow." Obviously, there is a close relation between third and three, fourth and four, and so on.
These relationships occur in other language families, also. In Hebrew, for example, first is derived from the word head, meaning "the foremost." The Hebrew word for second is derived from the word two, thisrelationship of ordinal and cardinal names holds for all the other numbers.
For a very long time, people have counted from one, not from zero, As a matter of fact, the inclusion of zero as a full-fledged member of the set of all numbers is a relatively modern concept, even though it is one of the most important numbers mathematically. It has many important properties, such as being a multiple of any integer.
A nice mathematical theorem states that for any basis b the first bⁿ positive integers are represented by exactly n digits (leading zeros included). This is true if and only if the count starts with zero (hence, 0 through bⁿ-1), not with one (for 1 through bⁿ).
This theorem is the basis of computer memory ad dressing. Typically, 2ⁿ cells are addressed by an N-bit addressing scheme. A count starting from one rather than zero would cause the loss of either one memory cell or an additional address line. Since either price is too expensive, computer engineers agree to use the mathematical notation that starts with zero. Good for them!
This is probably the reason why all memories start at address-0, even those of systems that count bits from B1 up.
The designers of the 1401 were probably ashamed to have address-0. They hid it from the users and pretended that the memory starts at address-1.
Communication engineers, like most people, start counting from one. They never have to suffer the loss of a memory cell, for example. Therefore, they happily count one-to-eight, not zero-to-seven, as computer people do. ^ref

What if they had just removed the bit 0.. then the 8th number would still be 111... — DanMatlin, Aug 17 '13 at 20:38
Are you actually suggesting modification of basic arithmetic to make it fit in? Don't you think what we have today is a far better solution? — Anirudh Ramanathan, Aug 17 '13 at 20:39
Years later my 2 cnt worth. In my experience (~35 years of programming) the modulo or modular addition operation in one form or other comes up surprisingly often . With zero base the next in sequence is (i+1)%n but with base 1 it comes (i-1)%n)+1, so I think 0 based is preferred. This crops up in math and programming quite often. Maybe it is just me or the field in which I work. — nyholku, Apr 12 '18 at 13:50
While all good reasons I think it's much simpler: `a[b]` was implemented as `*(a+b)` in early compilers. Even today you can still write `2[a]` instead of `a[2]`. Now if indexes didn't start at 0 then `a[b]` would turn into `*(a+b-1)`. This would have required 2 additions on CPUs of the time instead of 0, meaning half the speed. Clearly not desirable. — Goswin von Brederlow, Jun 14 '18 at 15:07
The probable reason is totally f*** up. If you take 3 bits as example, you have exactly 8 pieces of information you can represent, and the first 8 non-negative integers are numbers from 0 to 7. That is, you can represent 8 states without ever representing the number 8 itself, and that is what that quote is about. The 1st number in the series is still 0. I'll say it again: 0 is the 1st. — Spyryto, Oct 02 '18 at 10:31
Just because you want 8 states, it does not mean you must have the number 8 in them. The light switches in my house are happy to represent "light on", "light off" states, without ever wondering why they don't represent the number 2. — Spyryto, Oct 02 '18 at 10:37
And BTW, when I say 0 is the 1st, I've started counting from 0. — Spyryto, Oct 02 '18 at 10:38
Does this explanation has anything to do with array? 3 bits represent 8 decimal number no matter if you count from 1 or count from 1000, it always represent 8. But it does not explain why array counts from 0. — Dexter, Jun 28 '20 at 14:06
It maps to the hardware more easily and requires one less translation, and why hardware is designed the way it is, is addressed in the above excerpt. Of course, this is conjecture until the point when the engineer that first mapped the start of an array to 0 comes and speaks their reasoning. — Anirudh Ramanathan, Jun 28 '20 at 19:02
"*While Dijkstra's article (previously referenced in a now-deleted answer) makes sense from a mathematical perspective*" It doesn't even make sense from a math perspective. LTE/GTE is no harder to parse when reading than LT/GT; Dijkstra was just being lazy, or was bad at reading comprehension, I suppose. — TylerH, Oct 14 '22 at 18:37

Doug T. · Answer 3 · 2011-09-06T13:47:47.153

Because 0 is how far from the pointer to the head of the array to the array's first element.

Consider:

int foo[5] = {1,2,3,4,5};

To access 0 we do:

foo[0]

But foo decomposes to a pointer, and the above access has analogous pointer arithmetic way of accessing it

*(foo + 0)

These days pointer arithmetic isn't used as frequently. Way back when though, it was a convenient way to take an address and move X "ints" away from that starting point. Of course if you wanted to just stay where you are, you just add 0!

score 26 · Answer 4 · answered Oct 05 '11 at 23:55

26

Because 0-based index allows...

array[index]

...to be implemented as...

*(array + index)

If index were 1-based, compiler would need to generate: *(array + index - 1), and this "-1" would hurt the performance.

answered Oct 05 '11 at 23:55

Branko Dimitrijevic

50,809
10
93
167

4

You bring up an interesting point. It can hurt performance. But will the performance hit be significant to justify use of 0 as starting index ? I doubt it. – FirstName LastName Jan 20 '13 at 02:26
4

@FirstNameLastName 1-based indexes offer no advantage over 0-based indexes yet they perform (slightly) worse. That justifies 0-based indexes no matter how "small" the gain is. Even if 1-based indexes offered some advantage, it's in the spirit of C++ to choose performance over convenience. C++ is sometimes used in contexts where every last bit of performance matters, and these "small" things can quickly add up. – Branko Dimitrijevic Jan 20 '13 at 20:11
Yes, i understand that small things can add up and sometimes become a big thing. For example, $1 per year is not much money. But, if 2 billion people donate it, then we can do a lot of good for humanity. I am looking for a similar example in coding which could cause poor performance. – FirstName LastName Jan 21 '13 at 09:17
2

Rather than subtracting 1, you should use the address of the array-1 as the base address. That what we did in a compiler I once worked on. That eliminates the runtime subtraction. When you're writing a compiler, those extra instructions matter a lot. The compiler will be used to generate thousands of programs, each of which may be used thousands of times, and that extra 1 instruction may occur in several lines inside an n squared loop. It can add up to billions of wasted cycles. – progrmr Mar 10 '13 at 18:19
No it wont hurt the performance once it is compiled, it will only add a small build time because in the end it will be translated to machine code.It will only hurt the compiler designers. – Hassaan Akbar Jul 01 '18 at 23:47
@HassaanAkbar The result of the compilation will still include one more decrement instruction (barring optimizations mentioned above, which have their own trade-offs). – Branko Dimitrijevic Jul 02 '18 at 07:39

progrmr · Answer 5 · 2011-09-06T14:14:14.283

14

Because it made the compiler and linker simpler (easier to write).

Reference:

"...Referencing memory by an address and an offset is represented directly in hardware on virtually all computer architectures, so this design detail in C makes compilation easier"

and

"...this makes for a simpler implementation..."

edited Sep 06 '11 at 14:14

answered Sep 06 '11 at 13:42

progrmr

75,956
16
112
147

1

+1 Not sure why the down votes. While it doesn't directly answer the question, 0-based indexing is not natural for people or mathematicians - the only reason it's done is because the implementation is logically consistent (simple). – phkahler Sep 06 '11 at 17:58
4

@phkahler: the error is in authors and languages calling array indices as indices; if you think of it as an offset, then 0-based becomes natural for lay person as well. Consider the clock, the first minute is written as 00:00, not 00:01 isn't it? – Lie Ryan Sep 06 '11 at 19:24
3

+1 -- this is probably the most correct answer. C predate Djikistras paper and was one of the earliest "start at 0" languages. C started life "as a high level assembler" and its likely that K & R wanted to stick as closely to the way it was done in assembler where you would normaly have a base address plus an offset starting at zero. – James Anderson Sep 07 '11 at 08:40
I thought the question was why 0 based was used, not which is better. – progrmr Sep 07 '11 at 12:43
2

I won't downvote but as progrmr commented above the base can be taken care of by adjusting the arrays address so regardless of base execution time is same and this is trivial to implement in the compiler or interpreter so it does not really make for simpler implementation. Witness Pascal where you can use any range for indexing IIRC, it has been 25 years ;) – nyholku Apr 12 '18 at 13:54

score 5 · Answer 6 · edited Aug 11 '16 at 10:44

5

Array index always starts with zero.Let assume base address is 2000. Now arr[i] = *(arr+i). Now if i= 0, this means *(2000+0)is equal to base address or address of first element in array. this index is treated as offset, so bydeafault index starts from zero.

edited Aug 11 '16 at 10:44

maazza

7,016
15
63
96

answered Aug 10 '16 at 14:15

Amit Prakash

61
1
1

Devrath · Answer 7 · 2019-08-10T13:33:15.027

I am from a Java background. I Have presented answer to this question in the diagram below which i have written in a piece of paper which is self explanatory

Main Steps:

Creating Reference
Instantiation of Array
Allocation of Data to array

Also note when array is just instantiated .... Zero is allocated to all the blocks by default until we assign value for it
Array starts with zero because first address will be pointing to the reference (i:e - X102+0 in image)

Note: Blocks shown in the image is memory representation

score 4 · Answer 8 · answered Jul 04 '20 at 01:45

It is because the address has to point to the right element in the array. Let us assume the below array:

let arr = [10, 20, 40, 60];

Let us now consider the start of the address being 12 and the size of the element be 4 bytes.

address of arr[0] = 12 + (0 * 4) => 12
address of arr[1] = 12 + (1 * 4) => 16
address of arr[2] = 12 + (2 * 4) => 20
address of arr[3] = 12 + (3 * 4) => 24

If it was not zero-based, technically our first element address in the array would be 16 which is wrong as it's location is 12.

score 4 · Answer 9 · edited Apr 10 '18 at 18:49

4

For the same reason that, when it's Wednesday and somebody asks you how many days til Wednesday, you say 0 rather than 1, and that when it's Wednesday and somebody asks you how many days until Thursday, you say 1 rather than 2.

edited Apr 10 '18 at 18:49

Waldir Leoncio

10,853
19
77
107

answered Sep 06 '11 at 14:02

R.. GitHub STOP HELPING ICE

208,859
35
376
711

8

Your answer seems just a matter of opinion. – heltonbiker Sep 07 '11 at 02:19
6

Well, it's what makes adding indices/offsets work. For example if "today" is 0 and "tomorrow" is 1, "tomorrow's tomorrow" is 1+1=2. But if "today" is 1 and "tomorrow" is 2, "tomorrow's tomorrow" is not 2+2. In arrays, this phenomenon happens whenever you want to consider a subrange of an array as an array in its own right. – R.. GitHub STOP HELPING ICE Sep 07 '11 at 02:37
7

Calling a collection of 3 things "3 things" and numbering them 1,2,3 is not a deficiency. Numbering them with an offset from the first one is not natural even in mathematics. The only time you index from zero in math is when you want to include something like the zero-th power (constant term) in a polynomial. – phkahler Sep 08 '11 at 17:57
9

Re: "Numbering arrays starting with 1 rather than 0 is for people with a severe deficiency of mathematical thinking." My edition of CLR's "Introduction to Algorithms" uses 1-based array indexing; I don't think the authors have a deficiency in mathematical thinking. – RexE Sep 16 '11 at 03:11
No, I would say the seventh one is at index 6, or 6 positions away from the first one. – R.. GitHub STOP HELPING ICE Apr 03 '14 at 22:58

score 3 · Answer 10 · answered Aug 04 '15 at 21:43

The most elegant explanation I've read for zero-based numbering is an observation that values aren't stored at the marked places on the number line, but rather in the spaces between them. The first item is stored between zero and one, the next between one and two, etc. The Nth item is stored between N-1 and N. A range of items may be described using the numbers on either side. Individual items are by convention described using the numbers below it. If one is given a range (X,Y), identifying individual numbers using the number below means that one can identify the first item without using any arithmetic (it's item X) but one must subtract one from Y to identify the last item (Y-1). Identifying items using the number above would make it easier to identify the last item in a range (it would be item Y), but harder to identify the first (X+1).

Although it wouldn't be horrible to identify items based upon the number above them, defining the first item in the range (X,Y) as being the one above X generally works out more nicely than defining it as the one below (X+1).

score 2 · Answer 11 · edited Oct 14 '22 at 18:38

Suppose we want to create an array of size 5

int array[5] = [2,3,5,9,8]

let the 1st element of the array is pointed at location 100

and let we consider the indexing starts from 1 not from 0.

now we have to find the location of the 1st element with the help of index (remember the location of 1st element is 100)

since the size of an integer is 4-bit therefore --> considering index 1 the position would be size of index(1) * size of integer(4) = 4 so the actual position it will show us is

100 + 4 = 104

which is not true because the initial location was at 100. it should be pointing to 100 not at 104 this is wrong

now suppose we have taken the indexing from 0 then the position of 1st element should be the size of index(0) * size of integer(4) = 0

therefore --> location of 1st element is 100 + 0 = 100

and that was the actual location of the element this is why indexing starts at 0;

Gianluca Ghettini · Answer 12 · 2015-08-05T10:24:36.420

Try to access a pixel screen using X,Y coordinates on a 1-based matrix. The formula is utterly complex. Why is complex? Because you end up converting the X,Y coords into one number, the offset. Why you need to convert X,Y to an offset? Because that's how memory is organized inside computers, as a continuous stream of memory cells (arrays). How computers deals with array cells? Using offsets (displacements from the first cell, a zero-based indexing model).

So at some point in the code you need (or the compiler needs) to convert the 1-base formula to a 0-based formula because that's how computers deal with memory.

score 1 · Answer 13 · answered Jan 09 '20 at 09:35

1

In array, the index tells the distance from the starting element. So, the first element is at 0 distance from the starting element. So, that's why array start from 0.

answered Jan 09 '20 at 09:35

Rishi Raj Tandon

642
8
15

score 1 · Answer 14 · answered Sep 06 '11 at 13:32

The technical reason might derive from the fact that the pointer to a memory location of an array is the contents of the first element of the array. If you declare the pointer with an index of one, programs would normally add that value of one to the pointer to access the content which is not what you want, of course.

score 0 · Answer 15 · answered Aug 27 '19 at 13:15

first of all you need to know that arrays are internally considered as pointers because the "name of array itself contains the address of the first element of array "

ex. int arr[2] = {5,4};

consider that array starts at address 100 so element first element will be at address 100 and second will be at 104 now, consider that if array index starts from 1, so

arr[1]:-

this can be written in the pointers expression like this-

 arr[1] = *(arr + 1 * (size of single element of array));

consider size of int is 4bytes, now,

arr[1] = *(arr + 1 * (4) );
arr[1] = *(arr + 4);

as we know array name contains the address of its first element so arr = 100 now,

arr[1] = *(100 + 4);
arr[1] = *(104);

which gives,

arr[1] = 4;

because of this expression we are unable to access the element at address 100 which is official first element,

now consider array index starts from 0, so

arr[0]:-

this will be resolved as

arr[0] = *(arr + 0 + (size of type of array));
arr[0] = *(arr + 0 * 4);
arr[0] = *(arr + 0);
arr[0] = *(arr);

now, we know that array name contains the address of its first element so,

arr[0] = *(100);

which gives correct result

arr[0] = 5;

therefore array index always starts from 0 in c.

reference: all details are written in book "The C programming language by brian kerninghan and dennis ritchie"

score -2 · Answer 16 · answered Aug 03 '15 at 15:52

-2

Array name is a constant pointer pointing to the base address.When you use arr[i] the compiler manipulates it as *(arr+i).Since int range is -128 to 127,the compiler thinks that -128 to -1 are negative numbers and 0 to 128 are positive numbers.So array index always starts with zero.

answered Aug 03 '15 at 15:52

Niks

1

1

What do you mean by _'int range is -128 to 127'_? An `int` type is required to support at least a 16-bit range, and on most systems these days supports 32-bits. I think your logic is flawed, and your answer really doesn't improve on the other answers already provided by other people. I suggest deleting this. – Jonathan Leffler Aug 05 '15 at 00:55

Why does the indexing start with zero in 'C'?

16 Answers16

About the above reasons

The probable reason

Why is this relevant

Conclusion

Linked

Related