3

Let's say I have an N*M flat array and I would like to use as a multidimensional array. But instead of manually indexing it as array[row*M + column] I thought I will point a pointer to a multidimensional array of it.

char flatArray[N*M];
char (*multidimensional)[M] = (char (*)[M])flatArray;

Then accessing the contents of flat array via the multidimensional pointer.

It works with and without optimization and compiles without warnings.

But my concern is that the compiler may want to be smart in the future and if the length of the row is not a power of 2 it may pad it so it can generate code that exploits the x86 addressing, so the array access would be a single mov eax, [4*ebx+ecx] instead of having to calculate the index first with a few instructions before each time the array is accessed.

The question is: Are standards compliant C compilers allowed pad the rows in a multidimensional array to generate faster code?

Calmarius
  • 18,570
  • 18
  • 110
  • 157
  • 1
    No. Arrays are contiguous. But aliasing is bad for other reasons. Where this `int` came from? – Eugene Sh. May 01 '18 at 21:52
  • https://stackoverflow.com/questions/6290956/one-dimensional-access-to-a-multidimensional-array-is-it-well-defined-behaviour – Oliver Charlesworth May 01 '18 at 22:01
  • It is not uncommon to use a flat memory segment to represent a multi-dimensional array of any number of dimensions. If you want to do that, you could easily write the functions to index into the memory segment based on the row, column, etc. values. For example, you frequently see a sparse matrix represented as a flat array. – bruceg May 01 '18 at 22:16
  • @EugeneSh. Sorry the original question was about int arrays then I changed it to char, but forgot to change the cast. – Calmarius May 02 '18 at 10:05

1 Answers1

4

The Standard requires that for any type Foo, the size of a Foo[N] be exactly N * sizeof (Foo). It does not, however, specify any circumstances in which an object of type Foo[M][N] (or even a Foo[N]) may be accessed using an lvalue of type Foo, unless Foo happens to be a character type. Obviously it would be absurd to suggest that arrays of non-character type should not be accessible using subscript expressions, but the way N1570 6.5p7 is written doesn't actually allow that.

When using character types, the Standard seems to intend that a pointer to an object of size N may be converted to character type and treated as though it were a char[N]; functions like memcpy and fwrite rely upon such abilities. Still, the Standard doesn't explicitly say such a thing.

If one assumes that the intention of the Standard was to allow the elements of a Foo[M][N] to be accessed using an lvalue of a non-character type Foo in at least some circumstances, defining what those circumstances are would likely answer the question of when one may use a "flattened" pointer to access a multi-dimensional array.

Personally, I think that an lvalue D which is derived from an lvalue of another type (e.g., given Foo myArray[10][12]; Foo *p = myArray[5];, I would regard myArray[5] as being an lvalue derived from myArray, myArray[5][0] as derived from myArray[0], and *p as derived from myArray[5][0]) should be usable to access storage within the object identified by the original, at least until either:

  1. The same storage is accessed via some means not derived from D.

  2. An lvalue is produced that isn't derived from D, that will be used to access that storage in conflicting fashion, or produce another lvalue that will be.

  3. Code enters a loop or function wherein either of the above occurs.

Applying that principle, and treating a cast to an array-element type as "laundering" the exact type of the array, casting a Foo[M][N] to a Foo* would yield a pointer which could access all elements of the array as long as the array isn't accessed via any other means. A compiler that makes any bona fide effort to stay out of a programmer's way should have no trouble with such code. Unfortunately, the Standard doesn't specify any situations where compilers are required to allow derived lvalues to access parent objects, and some compiler writers interpret that as an invitation to ignore the possibility that lvalues might access storage associated with other lvalues from which they are derived.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • While `Foo[M][N]` and `Foo*` are different types, the actual access to memory happens via a `Foo*` in both cases: `foo[y][x]` is equivalent to `*(*(foo + y) + x)` which invokes array-pointer-decay two times to yield a single `Foo*` that is finally dereferenced. Doesn't that guarantee that the compiler has to assume aliasing anyway? – cmaster - reinstate monica May 01 '18 at 22:30
  • @cmaster: The Standard is absolutely horrible at handling anything even remotely resembling corner cases. The Standard makes very clear that given `FOO foo[M][N]`, a compiler would be entitled to assume that `foo[0][x]` and `foo[1][y]` are disjoint because `foo[0]` is an array object, and pointer arithmetic on a pointer to an array element is only allowed to yield a pointer to, or just past, an element of the same array, and "just past" pointers can't be dereferenced. It's not clear what needs to be done to convert a pointer to an array element into a pointer unaffiliated with the array. – supercat May 01 '18 at 22:40
  • @cmaster: On the other hand, the way 6.5p7 is written would cause evaluation of even something as simple as `foo[0][0]` to invoke Undefined Behavior unless `FOO` is a character type, since it does not allow objects of aggregate types to have their storage accessed via lvalues of their constituent types (though it does allow the reverse). – supercat May 01 '18 at 22:43
  • [C11 Standard - 6.5 Expressions (7, bullet 1)](http://port70.net/~nsz/c/c11/n1570.html#6.5p7) *"a type compatible with the effective type of the object"* muddies the waters still. I have still not seen a concise description of the boundaries of what *"a type compatible with the effective type of the object"* is limited to. I'm not saying there is anything incorrectly stated, but indexing a flat array with a pointer to type of size X would seem to be supported as type compatible with the effective type of the array. (the onus is on the programmer to stay in bounds (or 1 past it). – David C. Rankin May 02 '18 at 02:19
  • @DavidC.Rankin: The language in the Standard is absurdly vague. If a compiler recognizes the use of a derived lvalue to access the parent, stretching the bounds of what types are allowable isn't necessary. If a compiler can't recognize the use of derived lvalues, stretching the bounds of what types are allowed isn't sufficient. – supercat May 02 '18 at 02:49
  • Well, that's about as clear and concise a statement regarding 6.5(6)&(7) as I've ever seen... – David C. Rankin May 02 '18 at 02:57
  • Sorry the original question was about int arrays then I changed it to char, but forgot to change the cast. (because ints are usually aligned so the x86 alignment argument wouldn't apply) – Calmarius May 02 '18 at 10:06
  • @DavidC.Rankin: It irks me that ever since Defect Report #028 made it clear that the predecessor to 6.5p7 was insufficient, the Standard has added complexity without bothering to define the behavior of things as simple as `someStruct.member = 23;` for non-character-type members. – supercat May 02 '18 at 14:51
  • They have generated no-end of debate among those who care enough to read and try and stay within the defined standard. In the real world I'm both a professional engineer and attorney -- I'm good at discerning technical references. 6.5p7 might as well be goggeldygoop the way it is written. The gist of the language is clear, but it provides zero clarity or guidance for what is and what isn't a type compatible with the effective type on the margins. – David C. Rankin May 02 '18 at 19:13
  • @DavidC.Rankin: Actually, the lack of clarity is with what it means for an lvalue to be "used" to perform a given access. The fact that a `union u` contains a `struct s` does not imply blanket permission to use an lvalue of type `struct s` to modify the storage belonging to a `union u` nor any of its constituents, but a pointer formed from an lvalue or pointer of a different type should be usable to address and/or access storage in the context where it is formed, or in nested contexts where it is the exclusive means of addressing/accessing that storage. – supercat May 02 '18 at 19:23