2

I've been accessing N-dimensional arrays with one-dimensional pointers in C for decades now. But now, reading this other SO question, I learnt that's UB.

I feel quite disappointed to read that, to be honest, even if it works the way you'd expect in every compiler, but it's UB anyway.

One area where I find this most worrying is when writing serialization functions.

For example, let's assume you have several float arrays such as:

float a[5];
float b[4][4];
float c[7][2][4][5];

According to the C specification, should I write three different functions if I want to serialize these three arrays?

The following function (which is the way I'd do this), is UB according to the question above:

void serializeNfloatarray(FILE *stream, float *ptr, size_t ndims, size_t *dims) {
   
   size_t numitems=1;
   
   if(ndims==0) return;

   fprintf(stream, "%zu ", ndims);
   
   for(size_t f=0; f<ndims; f++) {
      fprintf(stream, "%zu ", dims[f]);
      numitems *= dims[f];
   }

   for(size_t i=0; i<numitems; i++)
      fprintf(stream,"%f ", ptr[i]); /* <- UB !!! */
}

Is it impossible to write one C function valid for all types of float arrays without going into UB?

cesss
  • 852
  • 1
  • 6
  • 15
  • The highest-voted answer to the question you linked gives a way to access the array which is well-defined. – interjay Oct 31 '21 at 09:09
  • @interjay: No, it's not well defined, because it casts a `char` pointer back to an `int` pointer, and that's UB. You can do the access only with the `char` pointer, not the `int` pointer. – cesss Oct 31 '21 at 09:12
  • This is separate from the question, but I think the function definition accepting a `float *` forces you to explicitly cast to `(float *)` by the caller in case of multidimensional arrays. If this is what you want, that is fine. Otherwise, you probably have to change the argument to `void *` to hush the compiler, but that would obviously be slightly susceptible to human error. – Cheatah Oct 31 '21 at 09:30

1 Answers1

3

The bytes representing any object may be accessed using the character types; the C standard defines the behavior of this. Therefore, the elements of any compound array may be accessed by copying their bytes. That can be done use your own code that copies bytes with a character type, or you can use memcpy, which is specified to copy bytes:

    for (size_t i = 0; i < numitems; i++)
    {
        //  Create a temporary object.
        float t;

        //  Copy bytes from the desired element into the intermediary object.
        memcpy(&t, (const unsigned char *) ptr + i * sizeof *t);

        //  Print the value.
        fprintf(stream, "%f ", t);
    }

Note that %f is insufficient to record the values of float numbers, as it does not guarantee enough digits. %a is designed for this.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • I do believe you can avoid this copying by just accepting a `const void *arg` argument and then implicitly cast this as `const float *ptr` and use `ptr[i]` like OP did. That is: I don't think the actual copying is necessary, just the pointing at the correct offset. So basically I think what OP did is fine, it's just that the warning could be avoided by the cast. I may be totally wrong though, in which case I would happily be corrected. – Cheatah Oct 31 '21 at 09:23
  • 2
    @Cheatah: For an object declared as, say, `float A[10][20]`, the C standard does not define the behavior of accessing its elements via a single `float *p`. There are two reasons for this. One, pointer arithmetic is defined only within array bounds (including a notional end position). So, if `p` points to `A[0][0]`, `p+i` is defined only for 0 ≤ i ≤ 20. The fact that `A[0]` is embedded inside a larger array does not affect this, because the clause defining pointer arithmetic makes no provision for it. – Eric Postpischil Oct 31 '21 at 10:52
  • 2
    Two, the standard does not clearly say an object of type `float [10][20]` may be aliased as an object of type `float [200]`. The rules on this are unclear, and the standard ought to be updated with clearer specifications of the aliasing rules. Nonetheless, the rules are what they are and do not tell us we can do this. In contrast, any object may be accessed as an array of characters, so the byte-by-byte copy is the way to do it. A good compiler will eliminate the `memcpy` in optimization and simply directly load the relevant data into a register to pass to `fprintf`. – Eric Postpischil Oct 31 '21 at 10:55
  • 1
    It's really unfortunate that 1)this copy is necessary, 2)every C programmer would write this without this copy, and 3)the C23 spec is now feature-freeze, with tons of new additions that come with new syntax constructions but that don't fix very obvious problems like this one (unless I read the docs too fast). I'm leaving the question open for a couple of days just in case somebody finds a standard-compliant way to avoid the copy, but I'm afraid it's unlikely... – cesss Nov 01 '21 at 10:40
  • 1
    @cesss: Re “every C programmer would write this without this copy”: No, not every C programmer would write this without this copy. There are people who pay attention to specifications and seek to engineer well-defined code. When writing code for maximum portability, they would write the copy. In other circumstances, they might the copy but document the requirement that the program be compiled with a compiler that supports reshaping arrays. – Eric Postpischil Nov 01 '21 at 10:43