2

Recently I learned from user "chux" that it is legal to add 1 to an address that doesn't represent an array element. Specifically, the following provision in the standard (C17 draft, 6.5.6 ¶7)

For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

makes it legal to write &var + 1 where var is not representable as arr[i] for some T arr[n] where 0i<n.

What are use cases for doing this? I found an example by Aaron Ballman (on the SEI CERT C Coding Standard website) who mentions "allocation locality". Without quoting his entire example, the essence seems to be that one can allocate space for multiple objects using a single call to malloc, so that one can assign to them like this:

T1 *objptr1 = (T1 *)malloc(sizeof(T1) + sizeof(*objptr2));
*objptr1 = ...;
memcpy(objptr1 + 1, objptr2, sizeof(*objptr2))

Here is a toy example of mine:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
    float obj2 = 432.1;
    long *objptr1 = (long *)malloc(sizeof(*objptr1) + sizeof(obj2));
    *objptr1 = 123456789L;
    memcpy(objptr1 + 1, &obj2, sizeof(obj2));

    printf("%ld\n", *objptr1); // 123456789
    printf("%f\n", *(float *)(objptr1 + 1)); // 432.100006

    return 0;
}

I hope that this captures the essence of the idiom. (Perhaps it does not: As a commenter pointed out, my toy example assumes that the alignment of float is smaller than or equal to the alignment of long. The original example by Aaron Ballman had a string as the second object, and strings can be arbitrarily aligned. For a correct minimal (toy) version of Aaron Ballman's code stub see my own answer here.)

However, it seems that one could also simply use a (char *)-cast with sizeof instead:

    memcpy((char *)objptr1 + sizeof(*objptr1), &obj2, sizeof(obj2));

In the general case, &var + 1 is shorter than (char *)&var + sizeof var, so perhaps this is the advantage.

But is that all? What are use cases for writing (&var + 1) if var is not an array element?

Lover of Structure
  • 1,561
  • 3
  • 11
  • 27
  • As pointers in C behave like "array elements", or more accurately, arrays are really just fancy pointers, I think the distinction here is somewhat irrelevant. Any value can be "representable" as an array. Wether or not this access is valid depends on the memory layout of what you're accessing. – tadman May 17 '23 at 15:38
  • Sorry, but open-ended questions such as this aren't really appropriate for SO. – John Bollinger May 17 '23 at 15:38
  • 2
    Don't forget that when allocating space for different types of objects you *must* account for alignment issues. You're lucky here in that `float` is smaller than `long`, as otherwise you could have a misaligned `long`. – tadman May 17 '23 at 15:39
  • 2
    @tadman, arrays and pointers have a cozy relationship in C, but ***in no way*** are C arrays any kind of pointer. – John Bollinger May 17 '23 at 15:41
  • @JohnBollinger I get the impression that the idiom is either questionable or rarely used, hence I don't think there will be a large number of legitimate use cases. – Lover of Structure May 17 '23 at 15:42
  • 1
    @JohnBollinger I did say they were "fancy", there are differences, but compared to other languages, like C++ and `std::vector`, or even `std::array`, they are really little more than a pointer with some additional compile-time context. – tadman May 17 '23 at 15:42
  • 2
    @tadman, A valid pointer value is the address of an object. A valid array value is one or more objects themselves. These are fundamentally different, and people failing to understand that is among the common sources of C questions here. That the C++ standard library provides higher-level abstractions analogous to C arrays is not really relevant. – John Bollinger May 17 '23 at 15:53
  • 6
    The main use I can think of is if you have a function that takes a pointer to an element of an array and a pointer to the element after another element of the array (giving a range of elements from the array). In that case, you could pass the address of a non-array value, and the address that's one element past it. E.g. `func(&a, &a+1)`. – Tom Karzes May 17 '23 at 15:58
  • @tadman Good point about alignment. (The original example by Aaron Ballman had a string as the second object, and alignment isn't an issue for strings.) – Lover of Structure May 17 '23 at 16:49

4 Answers4

4

What are use cases for writing (&var + 1) if var is not an array element?

Not everything that falls out of the language semantics has a specific use. Most computer languages are designed for consistency and sufficiency. Some also aim for simplicity. Few, however, expressly target minimality, and C is not one of them.

The primary reason that pointer arithmetic is defined for pointers to scalars is that it makes it easier to define pointer arithmetic. Pointers to scalars are not a special case, which is good, because it's not necessarily possible to distinguish them from pointers to array elements (alternatively: implementations don't need to make that possible). Furthermore, making pointers to scalars equivalent to pointers to the single element of a one-element array is unproblematic, because the pointer types are the same and the representation of a scalar is identical to the representation of a one-element array of the same data type.

Given that pointer arithmetic is defined for pointers to scalars by relying on a semantic equivalence between scalars and single-element arrays, the use cases for &scalar + 1 are exactly the same as those for &single_element_array[0] + 1, in contexts where one wants to lean on that semantic equivalence. In turn, those cases are pretty much the same as the ones for &n_element_array[n-1] + 1 generally.

Perhaps a better question, then, would be why the language allows computing a pointer to just past the end of an array, and what use that might have. As far as I am aware or have ever been able to determine, those are primarily a matter of convenience. For example, it is easier to iterate over an array via pointers if you are permitted to compute (but not dereference) a pointer to just past the end of the array. And it is desirable to be able to express sub-arrays via an [inclusive_start, exclusive_end) pointer pair. Neither of those things is essential, however.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
3

If you have a 'real' array, you might write:

enum { N = 10 };
int arr[N];
…set the values in arr…
int *end = arr + N;
for (int *cur = arr; cur < end; cur++)
{
    …use *cur…
}
 

You can do the same with a single variable:

int var;
int *end = &var + 1;
for (int *cur = &var; cur < end; cur++)
{
    …use *cur…
}

You would probably have the loop hidden in a function, possibly a function that is passed the start of the array and one beyond the end of an array:

some_func(&arr[0], &arr[N]);
some_func(&var, &var + 1);

The same code can be used for both the ordinary variable and the normal array. You could also pass the function the start of the array and the length, and the function could do the arithmetic:

another_func(arr, N);
another_func(&var, 1);

with:

void another_func(int *base, size_t size)
{
    for (int *end = base + size; base < end; base++)
        …process *base…aka base[0]…
}

All the code using var depends on being able to create the address &var + 1 though none of it accesses the data at that address.

Lover of Structure
  • 1,561
  • 3
  • 11
  • 27
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
1

The reason for that is to allow you to make full pointer arithmetic valid also for individual variables which are not arrays, to be usable in-place where arrays are required.

For example, let's say that we want to read() bytes from stdin, but issuing an individual read() per character. Read() requires an array of char to be passed to it... but you are not going to define an array of just one char to be able to use it with read. In that case:

    /* indentation used to indicate local, automatic scope */
    char the_char;
    int res = read(0, &the_char, 1);

will allow read() internally to move the pointer to the end of the array without knowing that you have actually passed a single char variable. If that was not explicitly said in the standard, you should have written:

    char the_char[1];
    int res = read(0, the_char, 1);

but then, later you should write everywhere the_char[0] to refer to the read character, instead of just the_char (degrading the readability of your code)

Internally, read() can handle the buffer pointer as a pointer and creata a loop based on the pointer positions:

    for (char *p = buffer, * const end = buffer + len;
            p < end;
            p++)
    {
        /* something applying to *p */
    }

or

    for (int i = 0; i < len; i++) {
        /* something applying to each buffer[i] */
    }

In the first case, it applies to the pointed character that is referenced by the moving pointer. In the second, an auxiliary variable i is used to access the array elements sequentially.

Normally, the first version is more optimal as it is written, as the pointer is being moved on each iteration and the access to the element is made by just dereferencing the pointer. In the second case, an auxiliary variable is created (for better readability) but the access to the data has to be calculated as a variable offset repect to the array beginning and solved at each iterations. After the compiler optimizer is run, both versions normally reduce to the same assembler code, so which version you use normally means nothing.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31
0

Before flexible array members were introduced in C99, one way of emulating a struct with strings of indeterminate length inside would be to use a pointer within the struct to a string allocated to be directly after the struct.

A correct minimal (toy) version of Aaron Ballman's code stub illustrates the use of incrementing a pointer to something that is not an array element:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

struct rec {
    int a; /* dummy member */
    char *varstr;
    int b; /* dummy member */
};

struct rec *create_rec(const char *s) {
    struct rec *r;
    size_t len = strlen(s) + 1;

    r = malloc(sizeof(*r) + len);  /* implicit conversion from  void *  to  struct rec *  is okay */
    r->varstr = (char*)(r + 1);  /* casting from  struct rec *  to  char *  is okay */
    memcpy(r->varstr, s, len);
    return r;
}

int main(void) {
    struct rec *my_r;

    my_r = create_rec("this is a test");
    my_r->a = 9;
    my_r->b = 321;

    printf("%d\n", my_r->a); /* 9 */
    puts(my_r->varstr); /* this is a test */
    printf("%d\n", my_r->b); /* 321 */

    return 0;
}

The statement r->varstr = (char*)(r + 1); illustrating this treats struct rec *my_r as struct rec my_r[1]. (Of course, one could instead write r->varstr = (char*)r + sizeof(*r);, which doesn't rely on this trick and works equally well.)

Lover of Structure
  • 1,561
  • 3
  • 11
  • 27