11

If I have a simple tensor class like this

struct Tensor
{
    double XX, XY, XZ;
    double YX, YY, YZ;
    double ZX, ZY, ZZ;
}

Is it undefined behavior to use pointer-arithmetic (see below) to access its elements?

 double& Tensor::operator[](int i) 
{ 
    assert(i < 9); 
    return (&XX)[i]; 
}
Touloudou
  • 2,079
  • 1
  • 17
  • 28

4 Answers4

13

Yes, it is undefined behavior.

The data members are not in an array, and thus are NOT guaranteed to be stored back-to-back in contiguous memory, as pointer arithmetic would require. There may be indeterminate padding generated between them.

The correct way would be to access the members individually, eg:

double& Tensor::operator[](int i)
{
    switch (i)
    {
        case 0: return XX;
        case 1: return XY;
        case 2: return XZ;
        case 3: return YX;
        case 4: return YY;
        case 5: return YZ;
        case 6: return ZX;
        case 7: return ZY;
        case 8: return ZZ;
        default: throw std::out_of_range("invalid index");
    }
}

Alternatively, if you really want to use array syntax:

double& Tensor::operator[](int i)
{
    if ((i < 0) || (i > 8))
        throw std::out_of_range("invalid index");

    double* arr[] = {
        &XX, &XY, &XZ,
        &YX, &YY, &YZ, 
        &ZX, &ZY, &ZZ
    };

    return *(arr[i]);
}

Or

double& Tensor::operator[](int i)
{
    if ((i < 0) || (i > 8))
        throw std::out_of_range("invalid index");

    static double Tensor::* arr[] = {
        &Tensor::XX, &Tensor::XY, &Tensor::XZ,
        &Tensor::YX, &Tensor::YY, &Tensor::YZ, 
        &Tensor::ZX, &Tensor::ZY, &Tensor::ZZ
    };

    return this->*(arr[i]);
}

Otherwise, use an actual array for the data, and define methods to access the elements:

struct Tensor
{
    double data[9];

    double& XX() { return data[0]; }
    double& XY() { return data[1]; }
    double& XZ() { return data[2]; }
    double& YX() { return data[3]; }
    double& YY() { return data[4]; }
    double& YZ() { return data[5]; }
    double& ZX() { return data[6]; }
    double& ZY() { return data[7]; }
    double& ZZ() { return data[8]; }

    double& operator[](int i)
    {
        if ((i < 0) || (i > 8))
            throw std::out_of_range("invalid index");
        return data[i];
    }
};
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • can there be padding inserted if all these variables are of the same type? There could be padding before XX, but between XX and XY, etc. ? – Touloudou Oct 17 '19 at 09:07
  • 2
    @Touloudou not before XX, but between the other members, yes, if the alignment of `Tenser` as a whole is set larger than the alignment of the individual members – Remy Lebeau Oct 17 '19 at 09:08
  • I see. That makes sense, and definitely makes this code UB. Follow up question: is there a non-UB way to map indices to these variables then? – Touloudou Oct 17 '19 at 09:11
  • 1
    @Touloudou Do the opposite. Create an array member variable and getter member functions named `XX()`, `XY()`,... that return references to particular array elements. (Of course, there is a way how to map indices to member varialbes, but that would involve `if`s or `switch` with a runtime performance penalty.) – Daniel Langr Oct 17 '19 at 09:13
  • @Touloudou Note that the performance penalty might be reduced by optimizations: https://godbolt.org/z/50Lxq5. Though, there is still one additional `jmp` and the generated code is larger. – Daniel Langr Oct 17 '19 at 09:21
  • @DanielLangr That's what I had in mind. However, I don't see why there would be any performance penalty. The compiler should be able to inline such basics functions completely, no? – Touloudou Oct 17 '19 at 09:24
  • @DanielLangr That's not quite what I had in mind. I was considering completely getting rid of XY, XY, XZ, etc and replace them with std::array. Then, I can write the bracket operator efficiently (without UB) and also add a bunch of methods Tensor::XY(), etc. for convenience. – Touloudou Oct 17 '19 at 09:50
  • 1
    @DanielLangr Just saw your newly updated answer. Probably gonna go with that. There shouldn't be any performance penalty and UB is gone. – Touloudou Oct 17 '19 at 09:51
  • @GeckoGeorge My plan was to include range checking in the form of an assert, as seen in the original question. – Touloudou Oct 17 '19 at 10:01
  • Oh, sorry, I'll remove that comment – GeckoGeorge Oct 17 '19 at 10:25
8

There's a cppcon talk that mentions this!

So yes, it's undefined behaviour, because classes and arrays don't share a common initial sequence.

Edit: Miro Knejp introduces that slide at around 3:44 if you want more context for all the non-c++ on the slide, but the question and answer is really the only part of the talk that goes into your question.

GeckoGeorge
  • 446
  • 1
  • 3
  • 11
  • 1
    Oh, god.. he says "I dunno why"... it's of course because method of storing and addressing array and single variable depends on platform's ISA. Likely you won't have problems on x86, but on SPARC? on MIPS64? on something else? – Swift - Friday Pie Oct 17 '19 at 09:00
  • I must have a short memory. I just watched that last week. Yet here I am looking for the answer. Did I mention I have a short memory? – Adrian McCarthy Jun 24 '21 at 19:18
3

It is undefined behavior.

In general, pointer arithmetic is properly defined only for the members of an array (and maybe one element after, as described in section §8.5.6 of the standard).

For classes/structures, this can't work, because the compiler can add padding or other data between the members. cppreference has a brief description of the class layout.

Now, moving to solutions to your problem, the first one would be to simply use something made for this, such as Eigen. It is a mature library for linear algebra, with well tested code and good optimizations.

If you are not interested in adding a new library, you would have to more or less implement manually either member access or operator[].

Paul92
  • 8,827
  • 1
  • 23
  • 37
3

Just another possible solution: wrap references in a class and have an array of wrappers.

struct Tensor
{
    double XX, XY, XZ;
    double YX, YY, YZ;
    double ZX, ZY, ZZ;

    class DoubleRefenceWrapper
    {
        double & ref;
    public:
        DoubleRefenceWrapper(double & r) : ref(r) {}
        double & get() { return ref; }
    } elements[9];

    Tensor() : elements{XX, XY, XZ, YX, YY, YZ, ZX, ZY, ZZ} {}

    double& operator[](unsigned int i)
    {
        if(i < 9)
        {
            return elements[i].get();
        }
        throw std::out_of_range("Tensor index must be in [0-8] range");
    }
};
p-a-o-l-o
  • 9,807
  • 2
  • 22
  • 35