26

By reading some details about pointers and arrays in C I got a little confused. On the one hand, the array can be seen as a data type. On the other hand, the array tends to be an unmodifiable lvalue. I imagine that the compiler will do something like replacing the array's identifier with a constant address and an expression for calculating the position given by the index at runtime.

myArray[3] -(compiler)-> AE8349F + 3 * sizeof(<type>)

When saying that an array is a data type, what does this exactly mean? I hope you can help me to clarify my confused understanding of what an array really is and how it is treated by the compiler.

Amade
  • 3,665
  • 2
  • 26
  • 54
Sam
  • 1,301
  • 1
  • 17
  • 26
  • 2
    An array is an object that encapsulates potentially more than one object in a contiguous block of memory. Arrays are related to pointers, but I think that's really another question, which you should read about [here.](http://c-faq.com/aryptr/index.html) –  Jul 23 '13 at 20:50
  • 2
    Your imagination appears to be correct (at least for popular C implementations). What exactly is the question? – Carl Norum Jul 23 '13 at 20:55
  • Arrays in C/C++ are confusing, since they are not quite "real", at least not as "real" as individual "scalars" and structs and pointers. They're kind of like quantum particles and they tend to disintegrate when you touch them. Another way to put it is that "array" is really only a compiler fiction, and, once compiled, all "arrayness" effectively vanishes into "pointerness". – Hot Licks Jul 23 '13 at 20:59
  • Read this [answer](http://stackoverflow.com/a/17691191/2455888). It may help you to some extent.(read complete answer) – haccks Jul 23 '13 at 21:03
  • 6
    @HotLicks: Array *objects* are just as "real" as any other type of object. An array of N `foo`s has a size of `N * sizeof (foo)`, for example. Array *expressions*, on the other hand, behave oddly, in ways thoroughly explained by section 6 of the [comp.lang.c FAQ](http://www.c-faq.com/), which H2CO3 linked to above. – Keith Thompson Jul 23 '13 at 21:06
  • @KeithThompson - How do you get the size of `foo[]` at runtime? You can't. Once the program is compiled you can't really tell it from `foo*`. If `sizeof(foo[10])` occurs in the source, one cannot tell that from simply having specified a literal number when looking at the disassembly. – Hot Licks Jul 23 '13 at 21:11
  • 1
    @HotLicks: Given: `typedef <...> foo; foo arr[10];`, it's guaranteed that `sizeof arr == 10 * sizeof (foo)`. `foo*` is a pointer type; I didn't mention any pointers. If you have a `foo*` pointer that points to the first element of some array of `foo`, that's not enough information to tell you how many elements the array has -- but that's not what I'm talking about. – Keith Thompson Jul 23 '13 at 21:15
  • 3
    Your title says C++, your question and tags say only C. Are you asking about C++ or not? – Ben Voigt Jul 23 '13 at 21:25
  • @KeithThompson - Like I said, it's entirely compiler fiction -- arrays are not a runtime type that can in any way be distinguished from a pointer. And the generated code doesn't treat them any differently. – Hot Licks Jul 23 '13 at 21:25
  • 3
    @HotLicks: All type information is compiler fiction. The hardware doesn't have types. – Ben Voigt Jul 23 '13 at 21:26
  • @HotLicks There is no such thing as a type for anything when you've gone passed the compiler. Bytes are bytes. – MGZero Jul 23 '13 at 21:27
  • 2
    @HotLicks: You are mistaken. Types exist in the semantics of a C program as defined by the C standard. Array types and pointer types are very different; we just happen to use pointers to manipulate arrays. I seriously suggest you read section 6 of the [comp.lang.c FAQ](http://www.c-faq.com/) before commenting further. – Keith Thompson Jul 23 '13 at 21:31
  • @Sam I wanted to answer, but question is not clear to me :( can you give some examples where array name not worked as you were expecting. yes array names are constant. – Grijesh Chauhan Jul 23 '13 at 21:46
  • Thank you guys for all your helpful answers. At least it is difficult to be more exact about what my question really is. If I would know the !exact! question, I'd not be confused. But by reading through all this answers I am sure to find what I'm looking for. But I think it gets now more into philosophy =) @BenVoigt: I fixed the title – Sam Jul 23 '13 at 22:14
  • 2
    @GrijeshChauhan: I wouldn't say that array names are constant. *If* an array name (or any expression of array type) is implicitly converted to ("decays" to) a pointer to the array's first element, then that pointer is not an lvalue, so you can't modify it. The array itself may or may not be constant -- or more precisely `const`; that depends on how the array is declared. If all arrays were "constant", then you couldn't do this: `int arr[10]; arr[0] = 42;` – Keith Thompson Jul 23 '13 at 22:32
  • 1
    @KeithThompson; I think array names are constant in respect of we cannot reassign `arr` to refer to a different array object. – haccks Jul 23 '13 at 22:55
  • 1
    @haccks That's not technically "constant". They're simply not modifiable lvalues. –  Jul 24 '13 at 05:46

2 Answers2

21

When speaking about that an array is a data type, what does this exactly mean?

A data type is a set of data with values having predefined characteristics. Examples of data types are: integer, floating point unit number, character, string, and pointer

An array is a group of memory locations related by the fact that they all have the same name and the same type.


If you are wondering why array is not modifiable then best explanation I have ever read is;

C didn't spring fully formed from the mind of Dennis Ritchie; it was derived from an earlier language known as B (which was derived from BCPL).1 B was a "typeless" language; it didn't have different types for integers, floats, text, records, etc. Instead, everything was simply a fixed length word or "cell" (essentially an unsigned integer). Memory was treated as a linear array of cells. When you allocated an array in B, such as

auto V[10];

the compiler allocated 11 cells; 10 contiguous cells for the array itself, plus a cell that was bound to V containing the location of the first cell:

    +----+
V:  |    | -----+
    +----+      |
     ...        |
    +----+      |
    |    | <----+
    +----+
    |    |
    +----+
    |    |      
    +----+
    |    |
    +----+
     ...

When Ritchie was adding struct types to C, he realized that this arrangement was causing him some problems. For example, he wanted to create a struct type to represent an entry in a file or directory table:

struct {
  int inumber;
  char name[14];
};

He wanted the structure to not just describe the entry in an abstract manner, but also to represent the bits in the actual file table entry, which didn't have an extra cell or word to store the location of the first element in the array. So he got rid of it - instead of setting aside a separate location to store the address of the first element, he wrote C such that the address of the first element would be computed when the array expression was evaluated.

This is why you can't do something like

int a[N], b[N];
a = b;

because both a and b evaluate to pointer values in that context; it's equivalent to writing 3 = 4. There's nothing in memory that actually stores the address of the first element in the array; the compiler simply computes it during the translation phase.


1. This is all taken from the paper The Development of the C Language

For more detail you may like to read this answer.


EDIT: For more clarity; Difference between modifiable l-value, non-modifiable l-value & r-value (in short);

The difference among these kinds of expressions is this:

  • A modifiable l-value is addressable (can be the operand of unary &) and assignable (can be the left operand of =).
  • A non-modifiable l-value is addressable, but not assignable.
  • An r-value is neither addressable nor assignable.
Community
  • 1
  • 1
haccks
  • 104,019
  • 25
  • 176
  • 264
  • 1
    @Sam To sort of tie this back to your original example, say you have ` foo[7]; myArray = foo;` then you'll get a compiler error on the second line (`myArray = foo;`). This is what they mean by an array being an unmodifiable lvalue*. Of course you can assign to `myArray[i]` (for any constant or variable `i`), but you cannot assign `myArray` directly. Think about what it means compiled... by your example, `myArray = foo;` would be `0xAE8349F = 0xDEADBEEF;` (assuming `foo` is located at address 0xDEADBEEF). A constant cannot be an lvalue for any assignment operator. – Dave Lillethun Jul 23 '13 at 21:58
  • As for the data type bit, I think that is essentially referring to the fact that you can declare variables of that type (e.g., `myArray` has the type "array of 3 " - or `[3]` for short). You can therefore also use arrays as parameter types to functions, and so forth. There is a subtle distinction between a parameter type `[3]` and a parameter type `*`. (Someone please double check my accuracy, and also see if I'm really answering the question that was asked here...) – Dave Lillethun Jul 23 '13 at 22:05
  • An "lvalue" is a bit more than just where it can go, and most simply it's an expression which refers to an object. In terms of operators, you would normally hear it described as something that can go on the left hand side of specifically the assignment operator, not any operator whatsoever, but obviously unmodifiable lvalues can't. – Crowman Jul 23 '13 at 22:11
  • @haccks: No, sorry, I was responding to DaveLillethun's "'lvalue' just means that it can appear to the left side of an operator". – Crowman Jul 23 '13 at 22:14
  • @PaulGriffiths; That's why I mentioned it in my answer. – haccks Jul 23 '13 at 22:15
  • @haccks: Oh, sorry, I didn't see that. I think it's basically close. I may be mistaken, but I don't think the C standard defines 'rvalue' at all, and just talks about 'the value of an expression'. I don't think either of your lvalue definitions are standard, but I think they both capture the essence of things, in the sense that only objects are addressable, and modifiable things by definition can be assigned to. – Crowman Jul 23 '13 at 22:21
  • Yes, I was just giving a quicky & dirty explaination to allow comprehension of my comment immediately preceeding it, in case the read did not know the term at all. It wasn't intended to be a thorough explaination with all the implications thereof and so forth. – Dave Lillethun Jul 23 '13 at 22:23
  • @PaulGriffiths; I don't know where the term r-value comes from. Neither edition of the C Standard uses it, other than in a footnote stating "What is sometimes called 'r-value' is in this standard described as the 'value of an expression.'" And yes I have not given the standard definition which says:- **an lvalue is an expression referring to an object** (The C Programming Language (Prentice-Hall, 1988)), rather I just differentiate these terms for more celerity. – haccks Jul 23 '13 at 22:31
  • 1
    @DaveLillethun: The names "lvalue" and "rvalue" refer specifically to the left and right sides of an *assignment*, not of any operator in general. Given `2 + 3`, neither `2` nor `3` is an lvalue. The exact definition of "lvalue" in the C standard has changed in each version, but basically it's an expression that (potentially) designates an object. – Keith Thompson Jul 23 '13 at 22:34
  • @DaveLillethun; Not only is every operand either an lvalue or an rvalue, but every operator yields either an lvalue or an rvalue as its result. For example, the binary + operator yields an rvalue. You are completely wrong with your statement: *So if you say a + b; then a is the lvalue and b is the rvalue* – haccks Jul 23 '13 at 22:44
  • Okay, I deleted that comment. Again, it was just meant to assist in understanding the comment preceding it in the simplest and most expedient way possible. Geez. – Dave Lillethun Jul 24 '13 at 06:46
  • Please leave a comment after downvoting. – haccks Oct 04 '13 at 20:25
0

An array is a contiguous block of memory. This means it's laid out in memory sequentially. Let's say we define an array like:

int x[4];

Where sizeof(int) == 32 bits.

This will be laid out in memory like this (picking an arbitrary starting address, let's say 0x00000001)

0x00000001 - 0x00000004
[element 0]
0x00000005 - 0x00000008
[element 1]
0x00000009 - 0x0000000C
[element 2]
0x0000000D - 0x00000010
[element 3]

You're correct that the compiler replaces the identifier. Remember (if you've learned this. If not, then you're learning something new!) that an array is essentially a pointer. In C/C++, the array name is a pointer to the first element of the array (or a pointer pointing to address 0x00000001 in our example). By doing this:

std::cout << x[2];

You're telling the compiler to add 2 to that memory address, which is pointer arithmetic. Let's say instead you use a variable to index:

int i = 2;
std::cout << x[i];

The compiler sees this:

int i = 2;
std::cout << x + (i * sizeof(int));

It basically multiplies the size of the datatype by the given index and adds that to base address of the array. The compiler basically takes the index-of operator [] and converts it to addition with a pointer.

If you really want to spin your head around this, consider this code:

std::cout << 2[x];

This is completely valid. If you can figure out why, then you've got the concept down.

haccks
  • 104,019
  • 25
  • 176
  • 264
MGZero
  • 5,812
  • 5
  • 29
  • 46
  • Please, there are lots of small inaccuracies in here which together kill the answer as far as reliability goes. For example `x[2]` does not add 2 to the address of `x`. Secondly an array is only functionally a continous block of memory, but that does not need to mean elements follow each other without gaps (think alignment). – user268396 Jul 23 '13 at 21:41
  • 3
    @user268396: The elements follow each other without gaps, the padding is part of the element. – Mooing Duck Jul 23 '13 at 21:44
  • You are of course correct. My typing isn't much better than the answer's but the point stands that the "picture" as described in the answer about memory addresses of the elements in the array (vs what you actually get) does not account for it. Not accounting for padding/alignment issues tends to introduce nasty bugs once people start doing clever things ... – user268396 Jul 23 '13 at 21:51
  • @BartekBanachewicz How about you read the rest of the answer which further elaborates? – MGZero Jul 23 '13 at 22:19
  • @user268396 Same for you. – MGZero Jul 23 '13 at 22:19
  • @MGZero the rest of the answer uses a phrase "C/C++" and has "an array is essentially a pointer", so no, it's not correct. – Bartek Banachewicz Jul 23 '13 at 22:21
  • @BartekBanachewicz " In C/C++, the array name is a pointer to the first element of the array" I stand by my answer. – MGZero Jul 23 '13 at 22:26
  • 5
    @MGZero: Then explain why `int arr[100]; sizeof arr` doesn't give you the size of a pointer. An array name is *converted* to a pointer to the first element *in most contexts*. Please read section 6 of the [comp.lang.c FAQ](http://www.c-faq.com/). – Keith Thompson Jul 23 '13 at 22:36
  • @KeithThompson Hmm, very interesting. I will admit, I don't normally use arrays in C++, but usually end up allocating them dynamically with a pointer, which would explain my misunderstanding here. – MGZero Jul 23 '13 at 22:40
  • @MGZero: Let me urge you again to clear up your misunderstanding by reading the FAQ section I cited. – Keith Thompson Jul 23 '13 at 22:43
  • @KeithThompson Will do – MGZero Jul 23 '13 at 22:44