Why is flattening a multidimensional array in C illegal?

Question

My book (Pointers on C by Kenneth Reek) says that the following is illegal although it works fine.

  int arr[5][5];
  int *p=&arr[2][2];
  p=p+3; // As array is stored in row major form I think this 
         //should make p point to arr[3][0]

The book says leaving one row to the next row is illegal. But I cannot understand why.

score 7 · Answer 1 · edited May 23 '17 at 12:03

The reason that the book says it's illegal is because pointer arithmetic is guaranteed to work only on pointers to elements in the same array, or one past the end.

arr is an array of 5 elements, in which each element is an array of 5 integers. Thus, theoretically, if you want to have pointers to array elements in arr[i], you can only do pointer arithmetic that yields pointers in the range &arr[i][0..4] or arr[i]+5 keeping i constant.

For example, imagine arr was a one dimensional of 5 integers. Then a pointer p could only point to each of &arr[0..4] or arr+5 (one past the end). This is what happens with multi-dimensional arrays as well.

With int arr[5][5];, you can only do pointer arithmetic such that you always have a pointer that is in the range &arr[i][0..4] or arr[i]+5 - that's what the rules say. It just may be confusing because these are arrays inside arrays, but the rule is the same no matter what. Conceptually, arr[0] and arr[1] are different arrays, and even though you know they are contiguous in memory, it is illegal to do pointer arithmetic between elements of arr[0] and arr[1]. Remember that conceptually, each element in arr[i] is a different array.

In your example, however, p+3 will point one past the end of arr[2][2], so it looks to me like it is valid nonetheless. It's a poor choice of an example because it will make p point precisely to one past the end, making it still valid. Had the author chosen p+4, the example would be correct.

Either way, I have never had any problems with flattening multidimensional arrays in C using similar methods.

Also see this question, it has got other useful information: One-dimensional access to a multidimensional array: well-defined C?

Does C stardand allow gaps between the subarrays of `arr[5][5]`, for example, `arr[1][0]` does not follow `arr[0][4]` directly? — Lee Duhem, Mar 03 '14 at 11:01
@leeduhem No, I don't think it allows. The C standard states that array elements are contiguous in memory, and that they are stored in row-major order (last subscript varies fastest). See section 6.2.5.1., "Array subscripting". It follows from this that there can't be gaps in between. — Filipe Gonçalves, Mar 03 '14 at 12:22
It looks like this can guarantee defined behavior for expression like `*(p+4)`, because if `p = &arr[2][2]`, `p+4` always points to `arr[3][1]`. — Lee Duhem, Mar 03 '14 at 12:30
@leeduhem No, it doesn't guarantee, because `p+4` goes out of bounds of the array it is pointing to. Yes, you know that after `arr[2]` there is another array, but even though it's contiguous, for whatever it's worth, it's **another** array, so you can't do that. This is a dark corner of C. Theoretically, it might not work, but I believe it always works. I guess it's just an inconsistency in the standard. — Filipe Gonçalves, Mar 03 '14 at 12:36

score 5 · Accepted Answer · edited May 23 '17 at 12:20

I gelled on this for awhile, and I'll try my best to explain where I think he's coming from, though without reading the book, it will be at-best-conjecture.

First, technically, the increment you propose (or he proposed) isn't illegal; dereferencing it is. The standard allows you to advance a pointer to one-past the last element of the array sequence from which it is being sourced for valuation, but not for dereference. Change it to p = p + 4 and both are illegal.

That aside, the linear footprint of the array not withstanding, ar[2] has a type, and it is int[5]. If you don't believe that, consider the following, all of which is correctly typed:

int ar[5][5];
int (*sub)[5] = ar+2;   // sub points to 3rd row
int *col = *sub + 2;    // col points to 3rd column of third row.
int *p = col + 3;       // p points to 5th colum of third row.

Whether this lands on ar[3][0] isn't relevant You're exceeding the declared magnitude of the dimension participating in the pointer-math. The result cannot legally be dereferenced, and were it larger than a 3-offset, nor could it be even legally evaluated.

Remember, the array being addressed is ar[2]; not just ar, and said-same is declared to be size=5. That it is buttressed up against two other arrays of the same ilk isn't relevant to the addressing currently being done. I believe Christoph's answer to the question proposed as a duplicate should have been the one selected for outright-solution. In particular, the reference to C99 §6.5.6, p8 which, though wordy, appears below with:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Sorry for the spam, but the bolded highlights are what I believe is relevant to your question. By addressing as you are, you're leaving the array being addressed, and as such walking into UB. in short, it works (usually), but is isn't legal.

haccks · Answer 3 · 2014-03-03T10:46:34.370

1

Yes. It is illegal in C. In fact by doing so you are laying to your compiler. p is pointing to the element arr[2][2] (and is of pointer to int type), i.e, 3rd element of third row. The statement p=p+3; will increment the pointer p to arr[2][5], which is equivalent to arr[3][0].
But this will fail whenever memory is allocated as a power of 2 ( 2ⁿ ) on some architechture. Now in this case the memory allocation would round up to 2ⁿ, i.e, in your case, each row would round up to 64 bytes.
See a test program in which the memory allocated is 5 allocations of 10 integers. On some machines, memory allocations are a multiple of 16 bytes, so the 40 bytes requested is rounded up to 48 bytes per allocation:

#include <stdio.h>
#include <stdlib.h>

extern void print_numbers(int *num_ptr, int n, int m);
extern void print_numbers2(int **nums, int n, int m);

int main(void)
{
    int **nums;
    int n = 5;
    int m = 10;
    int count = 0;

    // Allocate rows
    nums = (int **)malloc(n * sizeof(int *));

    // Allocate columns for each row
    for (int i = 0; i < n; i++)
    {
        nums[i] = (int *)malloc(m * sizeof(int));
        printf("%2d: %p\n", i, (void *)nums[i]);
    }

    // Populate table
    for (int i = 0; i < n; i++)
        for (int j = 0; j < m; j++)
            nums[i][j] = ++count;

    // Print table
    puts("print_numbers:");
    print_numbers(&nums[0][0], n, m);
    puts("print_numbers2:");
    print_numbers2(nums, n, m);
    return 0;
}

void print_numbers(int *nums_ptr, int n, int m)
{
    int (*nums)[m] = (int (*)[m])nums_ptr;

    for (int i = 0; i < n; i++)
    {
        printf("%2d: %p\n", i, (void *)nums[i]);
        for (int j = 0; j < m; j++)
        {
            printf("%3d", nums[i][j]);
        }
        printf("\n");
    }
}


void print_numbers2(int **nums, int n, int m)
{
    for (int i = 0; i < n; i++)
    {
        printf("%2d: %p\n", i, (void *)nums[i]);
        for (int j = 0; j < m; j++)
            printf("%3d", nums[i][j]);
        printf("\n");
    }
}

Sample output on Mac OS X 10.8.5; GCC 4.8.1:

 0: 0x7f83a0403a50
 1: 0x7f83a0403a80
 2: 0x7f83a0403ab0
 3: 0x7f83a0403ae0
 4: 0x7f83a0403b10
print_numbers:
 0: 0x7f83a0403a50
  1  2  3  4  5  6  7  8  9 10
 1: 0x7f83a0403a78
  0  0 11 12 13 14 15 16 17 18
 2: 0x7f83a0403aa0
 19 20  0  0 21 22 23 24 25 26
 3: 0x7f83a0403ac8
 27 28 29 30  0  0 31 32 33 34
 4: 0x7f83a0403af0
 35 36 37 38 39 40  0  0 41 42
print_numbers2:
 0: 0x7f83a0403a50
  1  2  3  4  5  6  7  8  9 10
 1: 0x7f83a0403a80
 11 12 13 14 15 16 17 18 19 20
 2: 0x7f83a0403ab0
 21 22 23 24 25 26 27 28 29 30
 3: 0x7f83a0403ae0
 31 32 33 34 35 36 37 38 39 40
 4: 0x7f83a0403b10
 41 42 43 44 45 46 47 48 49 50

Sample output on Win7; GCC 4.8.1:

enter image description here

edited Mar 03 '14 at 10:46

answered Mar 03 '14 at 10:01

haccks

104,019
25
176
264

3

`p+3` points to the same element as `arr[3][0]`. Looks valid to me. – Filipe Gonçalves Mar 03 '14 at 10:04
@WhozCraig; Yes. I agreed. You are right. In fact I should increment column rather than rows. – haccks Mar 03 '14 at 10:08
aren't array's assigned in contiguous memory, if so p+3 should make it point to arr[2][5], as in the end compiler gets only memory adddress?? – KARTHIK BHAT Mar 03 '14 at 10:12
@downvoters; Edited my answer. Take a look. – haccks Mar 03 '14 at 10:23
Could you give some source about this 2^n memeory allocation? – Lee Duhem Mar 03 '14 at 10:24
1

@leeduhem; I do not have strong evidence about it :). But once I got stuck with some similar issue. I asked this question to my mentor and he replied with a test program which I am including in my answer. – haccks Mar 03 '14 at 10:33
1

I never down voted, and will drop my comment as the only thing it pointed out has been addressed. – WhozCraig Mar 03 '14 at 10:42
Does C standard have any requirement about the layout of array such as `arr[5][5]`? Such as `arr[1][0]` must next to `arr[0][5]`? – Lee Duhem Mar 03 '14 at 10:54
@leeduhem; I am not sure about that. But treating 2D array as 1D is not a good practice. – haccks Mar 03 '14 at 10:57
I don't know why people are downvoting? If anything is wrong with this answer then please do let me know. I will remove it as soon as possible. – haccks Mar 03 '14 at 11:10
2

@leeduhem It does. Section 6.5.2.1, paragraph 3 of C99: "[...] It follows from this that arrays are stored in row-major order (last subscript varies fastest)" together with 6.2.5 entry 20: "[...] An array type describes a contiguously allocated nonempty set of objects." – Filipe Gonçalves Mar 03 '14 at 12:26
Your test program is testing something completely different, thus adding to the confusion. A two-dimensional array is **not** the same as an array of pointers. – user4815162342 Mar 03 '14 at 15:19
@user4815162342; I know. But the the issue is similar. No confusion here. – haccks Mar 03 '14 at 15:45
Not similar at all: you are confusing dynamic allocation with separate calls to `malloc` with the allocation of a contiguous (but multi-dimensional) array. Your test code is thus misleading and irrelevant to the question. – user4815162342 Mar 03 '14 at 17:12

Why is flattening a multidimensional array in C illegal?

3 Answers3

Linked