Safely passing arrays in C

Question

In C, passing arrays gives no information as to the length of the array passed, since they decay to raw pointers. This leaves the passing of pointer information up to the programmer. I demonstrate a few methods for doing so below, and discuss some pros and cons in the comments.

// only works if array has not already decayed,
// passing a raw pointer produces incorrect results 
#define foo(arr) _foo(arr, sizeof arr / sizeof *arr)
// most straightforward option, but can be unsafe
void _foo(int *arr, size_t n)
{
    for (size_t i=0; i < n; i++) {
        // code
    }
}

// very dangerous if array not prepared properly
// simple usage and implementation, but requires sentinel value
void bar(int *arr /* arr must end in -1 */ ) 
{
    for (size_t i=0; arr[i] != -1; i++) {
        // code
    }
}

/* doesn't provide type safety, pointless
// simplifies usage, still hacky in implementation
#define baz(arr) _baz(sizeof arr / sizeof *arr, &arr)
// safest option, but complex usage and hacky implementation
void _baz(size_t n, int (*pa)[n])
{
    int *arr = *pa;
    for (size_t i=0; i < n; i++) {
        // code
    }
}
*/

Are there any other methods I didn't consider? Are there more pros and cons I missed? Overall, what method would you consider to be the best for general use? Which method do you use?

The most common approach seems to be the first option with no macro. Is using the third option with the macro considered to be bad practice? To me it seems the most robust.

I see this question, however it does not mention the third method, and given that it was asked nearly 12 years ago I wouldn't be surprised if new insights could be gained.

EDIT: Upon further inspection, it seems option 3 only provides pointer type safety when the function takes a fixed size array, I incorrectly assumed the method from this answer would extend to varaiable length arrays and neglected to test it.

I'm not sure if the changes in C23 mentioned in chux's answer would fix this method, or if it could be simplified to baz(size_t n, int arr[n]). Reading through it, nothing in the linked paper seems to suggest int arr[n] would no longer decay to int *arr, but I may be wrong.

In general, this stuff boils down to opinion. I would say that more experienced C developers do not care about this. It's usually less experienced people complaining that the language allows them to do unsafe things. Pretty much always, a pointer plus a size is what people expect, _i.e._ option 1: `_foo`. This has the benefit of working on both arrays and any memory that is _not_ an array. It's the most flexible and intuitive, and is also the same paradigm used by standard library functions. As for the size macros, I hate those. They're useful in some cases and dangerous everywhere else. — paddy, Jul 20 '23 at 23:48
To be clear, I personally advocate _none_ of the proposed approaches, opting for the simple and clear 2 parameters specifying a pointer and a size. Macros that attempt to spare the programmer from explicitly dealing with this are detrimental. If you have a special array that you _really_ want to pass into functions as a single parameter, then define it as a struct with both a pointer and a size, then pass that struct. Don't use macros for this. — paddy, Jul 20 '23 at 23:58
"Pretty much always, a pointer plus a size is what people expect": Yes. Sometimes you need to pass &arr[x] not arr so size of arr is wrong, sometimes you are building a record of some kind or the header for some file format and want to pass (cast-to-type*) &binary-buffer[ offset ], etc. -- (pointer, size) covers many different use cases and is easy to understand. — Dave S, Jul 20 '23 at 23:58
"_Are there any other methods I didn't consider?_" -- You can wrap an array in a struct and pass the struct. In that case you can use `sizeof` to find the size of the array member. @DaveS -- "_Sometimes you need to pass &arr[x] not arr so size of arr is wrong_": wouldn't you agree that `sizeof arr` is always wrong when you pass an array to a function due to array decay (unless you need the size of the pointer)? — ad absurdum, Jul 21 '23 at 00:07
The 'obvious' one using VLA notation: `void foo_bar_baz(size_t n, int array[n]) { for (size_t i=0; i < n; i++) { …code… } }` —— This would be called as `int data[N] = { …initialization… }; foo_bar_baz(N, data);` —— This contrasts with your third option which requires you to call `_baz(N, &data);`. — Jonathan Leffler, Jul 21 '23 at 00:47
@JonathanLeffler that's just the same as "option 1" , since `int array[n]` is adjusted to `int *array` — M.M, Jul 21 '23 at 05:49
It is an invitation to an open dialogue rather than a question. There are no right and wrong answer, just opinions. As such it is off topic. — n. m. could be an AI, Jul 21 '23 at 07:25
@n.m.willseey'allonReddit "Are there any other methods I didn't consider? Are there more pros and cons I missed?" remain on topic. "what method would you consider to be the best for general use?" is OT. — chux - Reinstate Monica, Jul 21 '23 at 11:24
@M.M — no, it isn't the same as option 1. The order of the parameters is reversed, and the compiler knows the size of the array officially so it can optimize based on that and diagnose errors based on that. It is especially not the same when it comes to multi-dimensional arrays: `void baz_bar_foo(size_t m, size_t n, int data[m][n])`. — Jonathan Leffler, Jul 21 '23 at 14:41
@JonathanLeffler Array parameter *adjustment* takes away the innermost dimension. The compiler can't make assumptions based on that dimension. it's legal to access more than `n` elements if the array actually has that many — M.M, Jul 22 '23 at 05:46

chux - Reinstate Monica · Answer 1 · 2023-07-21T16:16:05.153

3

Are there any other methods I didn't consider?

Let's make a deal and consider door #4: Use size and then pointer: (size_t n, int a[n]).

C23 is coming out with a change: Variably-Modified Types to allow "possibly at compile-time with stronger analysis".
This works with modern compilers to form better code and detect weaknesses.

void foo_vla(size_t n, int arr[n]);

Available in C23 even if VLA, as an object, is not supported.
Available in C11/C17 with VLA support.
Available in C99 as that always supports VLA.
Available in C89 as void foo_val(size_t n, int arr[/*n*/]);, but then we reduce analyzability.

This is like the good answer to that old question not considered by OP and is even more relevant with C23.

edited Jul 21 '23 at 16:16

answered Jul 21 '23 at 01:43

chux - Reinstate Monica

143,097
13
135
256

Reading through the paper, it doesn't seem to suggest `int arr[n]` will no longer decay to a pointer. I'ts my understanding that `int arr[n]` is completely equivalent to `int *arr` from the compiler's point of view. Still a useful insight, and perhaps that C23 change will fix my third proposal (which doesn't actually work -- I forgot to check it :P). – ealker Jul 21 '23 at 05:57
1

@ealker `arr` in `int arr[n]` as a function _parameter_ remains a pointer of type `int *`, so the same from a type point-of-view. It differs from `int *arr` from an analyzability point-of-view as `int arr[n]` affords more information. And that is the point of "improving "safely": increasing the ability to detect errors. Matching types is not the only way to improve things. – chux - Reinstate Monica Jul 21 '23 at 11:26

score 1 · Answer 2 · edited Jul 21 '23 at 14:39

You are wrong in some assumptions:

// only works if array has not already decayed,
// passing a raw pointer produces incorrect results

Array is not decayed into a pointer at runtime; array decayment consists of a static conversion made at compilation time by converting the array declaration into a pointer declaration. So you declare a pointer instead, with no conversion at runtime, but evaluating the address of the first array element — this is how the array name is interpreted by the compiler — and passing it as the required pointer.

Originally, pointers were declared by appending [] at the end, and * for pointer declarations was introduced later. (Today, if you declare a pointer this way — except in parameter declarations, in that it decays to a pointer — a declaration like

/* global file scope */
int arr[];

is an incomplete type declaration that is completed at compilation time by assigning the array just one element, and a warning is issued. (I don't actually know if this is a standard requirement or a compiler extension):

pru.c: At top level:
pru.c:5:5: warning: array ‘ptr’ assumed to have one element
    5 | int ptr[];
      |     ^~~

A parameter declaration like:

void _foo(int *arr, size_t n)

never declares an array, but a pointer. Even if you do:

void _foo(int arr[], size_t n)

or

void _foo(int arr[7], size_t n)

that's an incomplete array declaration that does decay into a pointer declaration int *arr (indeed, a pointer was declared using [] in the original draft of C by K&R, and the * notation was included later, and the now different meaning indicated above was issued instead)

A parameter declaration (and this is the only case in which we can talk about decayment) is when you do a full array declaration (complete or incomplete) as in:

void _foo(int arr[7], size_t n)

or

void _foo(int arr[], size_t n)

in that case, the array declaration is said to decay to a pointer declaration, equivalent to:

void _foo(int *arr, size_t n)

In any case, array decayment to a pointer is done because an array is not an usable object as a whole and the compiler has no provision to allow you to calculate the array size, so in case you do e.g.

#include <stdio.h>

#define P(_lbl) printf("%s: size of %s == %zd\n", __func__, _lbl, sizeof arr)
void foo1(int arr[7])
{
   P("arr[7]");
}
void foo2(int arr[])
{
   P("arr[]");
}
void foo3(int *arr)
{
   P("*arr");
}
int main()
{
    int arr[3] = {0, 0, 0};
    P("arr[3]");
    foo1(arr);
    foo2(arr);
    foo3(arr);
}

will result in (showing the decayment in the array cases):

$ a.out
main: size of arr[3] == 12  -- in main, no array decayment happens
foo1: size of arr[7] == 8   -- in foo1, array decays into a pointer.
foo2: size of arr[] == 8    -- in foo2, array again decays into a pointer.
foo3: size of *arr == 8     -- in foo3, no decayment happens, but a pointer is declared.
$ _

Decayment of arrays into pointers is a completely supported feature, that doesn't happen at runtime (code is written to use pointers, and the array size is never known in the function body) You have seen that in main(), the array is declared as a local variable, but it doesn't decay into a pointer, as it does neither decay in a global variable.

In other languages that allow you to pass arrays as parameters, and you can know the array size that is passed from the outside to the inside of a function, the array size (being this the total size in bytes, or the number of elements) must be passed, hidden, as a parameter to the function, and this will allow array bound checking, or they are passed as references to objects (in OOP) that allow you to pass it hidden in the object instance representation. No magic is expected here. In C, only parameters actually declared are passed, and when an array is passed, the compiler converts all the array accesses into pointer accesses (and only at the first array level, no recursivity is done here) so a complex declaration like:

int foo(int(*arr[7][3])(void));

will decay into:

int foo(int(*(*arr)[3])(void));

this is a pointer to an array of three pointers to functions that take no parameters and return an integer.

Safely passing arrays in C

2 Answers2