Inquiry about class variable declarations in C++

Question

I have a class to represent a 3D vector of floats:

class Vector3D
{
    public:

    float x, y, z;
    float * const data;

    Vector3D() : x(0.0), y(0.0), z(0.0), data(&x) {}
}

My question is: are x, y, and z going to be allocated sequentially in memory such that I can assign the address of x to data and later use the subscript operator on data to access the vector components as an array?

For example, sometimes I may want to access the vector components directly:

Vector3D vec;
vec.x = 42.0;
vec.y = 42.0;
vec.z = 42.0;

And sometimes I may want to access them by offset:

Vector3D vec;
for (int i = 3; i--; )
    vec.data[i] = 42.0;

Will the second example have the same effect as the first one, or do I run the risk of overwriting memory other than the x, y, and z floats?

An interesting idea. I'm don;t think it would be a good idea to actually do, but a very interesting question. — JHSaunders, Jun 07 '11 at 23:39
Instead of getting the address of x explicitly, have you tried offsetof(Vector3D, x) + this? You'll have to make sure all members of the class are aligned properly. — James, Jun 07 '11 at 23:47
Just curious, why have data at all? You could just implement an operator[] on Vector3D and/or could cast the Vector3D * 'this' pointer to a "float *". You achieve the same affect, with less memory usage and syntax. — MerickOWA, Jun 08 '11 at 00:15
Or implement a union of a double[3] array with an x,y,z structure — MerickOWA, Jun 08 '11 at 00:18
@MerickOWA: You make a good point about memory savings (I will have 1000's of these objects). Overloading operator[] is something I will consider. — milesleft, Jun 08 '11 at 00:31
@MerickOWA : Casting a `Vector3D*` to a `float*` invokes UB, as does writing to one union member and reading from another. — ildjarn, Jun 08 '11 at 00:37
@ildjarn it maybe undefined, but if we assume data(&x) works, so will the other alternatives I suggested — MerickOWA, Jun 08 '11 at 01:40
@MerickOWA : `data(&x)` works, as long as `data` is treated as a pointer to a singular `float` rather than a pointer to an array of `float`s. Neither alternative you suggested is legal C++. — ildjarn, Jun 08 '11 at 02:01
@ildjarn: Casting a `Vector3D*` to a `float*` is specifically allowed. 9.2p20 "A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-ﬁeld, then to the unit in which it resides) and vice versa." — Ben Voigt, Jun 08 '11 at 02:10
@BenVoigt : Duly noted. Personally, I'm not quite ready yet to apply C++0x wording to questions not tagged C++0x, but I'll at least acknowledge that it will _probably_ work in practice for any reasonably-recent C++03 compiler too. — ildjarn, Jun 08 '11 at 02:52
@ildjarn: Considering the attention that the standards committee pays to existing implementations when they consider new rules, I'd say that's *almost-surely* the case. — Ben Voigt, Jun 08 '11 at 14:47

Oliver Charlesworth · Answer 1 · 2011-06-08T00:42:33.617

6

No, this is undefined behaviour, for two reasons:

Firstly for the padding issues that everyone else has mentioned.
Secondly, even if things are padded correctly, it is not valid to dereference a pointer with an offset that would take it beyond the bounds of what it's pointing to. The compiler is free to assume this, and make optimisations that would lead to undefined behaviour if you violate it.

However, the following would be valid:

class Vector3D
{
public:
    std::array<float,3> data;
    float &x, &y, &z;

    Vector3D() : data(), x(data[0]), y(data[1]), z(data[2]) { }
    Vector3D& operator =(Vector3D const& rhs) { data = rhs.data; return *this; }
};

std::array is new to C++0x, and is basically equivalent to boost::array. If you don't want C++0x or Boost, you could use a std::vector (and change the initializer to data(3)), although that's a much more heavyweight solution, its size could be modified from the outside world, and if it is, then the result would result be UB.

edited Jun 08 '11 at 00:42

answered Jun 07 '11 at 23:37

Oliver Charlesworth

267,707
33
569
680

1

This would be valid but would force the user to write a custom assignment operator since one won't be defined automatically. And adding `data()` to the constructor's initialization list will zero-initialize its contents. – ildjarn Jun 07 '11 at 23:54
@ildjarn: You are correct. I've now modified my suggestion; `vector` isn't great, but `array` should be better! – Oliver Charlesworth Jun 08 '11 at 00:01
@OliCharlesworth : I wasn't suggesting that you switch away from using a C-array, only that you should add `data` to the initialization list and write an appropriate copy-assignment operator. That said, I agree that `std::array<>`/`boost::array<>` is ideal here. – ildjarn Jun 08 '11 at 00:04
@ildjarn: I know. But the idea of having to write a copy-assignment operator for such a simple class bugs me! I'll put both options in... – Oliver Charlesworth Jun 08 '11 at 00:07
@OliCharlesworth : Also, the code as you currently have it very much runs the risk of invoking UB since `data` is public -- anyone can resize it or otherwise cause it to reallocate, at which point `x`, `y`, and `z` are bound to invalid memory locations. – ildjarn Jun 08 '11 at 00:07
@ildjarn: Indeed. Hopefully all caveats are now addressed in my answer. Thanks for your ideas! – Oliver Charlesworth Jun 08 '11 at 00:10
@OliCharlesworth : `data` needs to be in the initialization list for your first sample as well -- `std::array<>` is a POD type, so it will remain uninitialized by default. Also, both versions still need copy-assignment operators since both versions have data members that are references. Aside from that, I like these both better than the `vector` version. :-] – ildjarn Jun 08 '11 at 00:12
Sorry, I meant to use float * const data. Anyway, I like this solution. Is there any particular reason that you would use a boost::array or a std::vector over a regular array? – milesleft Jun 08 '11 at 00:13
@milesleft : If by "regular array" you mean C-array, then regular arrays have no place whatsoever in modern C++ code. They are a necessary relic of C that you shouldn't use directly if you can avoid it. – ildjarn Jun 08 '11 at 00:15
@OliCharlesworth : C-arrays get copied by implicit copy-assignment operators without issue; it's the references that are the problem. ;-] – ildjarn Jun 08 '11 at 00:16
@ildjarn: Dammit, I'm tired! You should really be writing your own answer instead of me, as most of this is your content by now! – Oliver Charlesworth Jun 08 '11 at 00:17
@OliCharlesworth : Haha. Your first version was _so close_ to perfect, I think you deserve the rep. I'll just edit your answer. :-P – ildjarn Jun 08 '11 at 00:18
@OliCharlesworth @ildjarn: Thanks a lot guys, It's been an enlightening discussion. I really don't think its a big issue for me to write a copy-assignment operator :) – milesleft Jun 08 '11 at 00:24
1

The potential issue with this is that the class will be a lot larger than a simple `float[3]`, as you'll be storing 3 pointers in addition to the data. This may or may not matter depending on what you're using the vectors for. – Michael Anderson Jun 08 '11 at 00:58
@Michael: Yes indeed. Although that's only 2 pointers worse than the OP's original code! – Oliver Charlesworth Jun 08 '11 at 01:01
1st, what about named accessors like `x()` etc and compute the array access in there? 2nd, if you haven't already, [please see this question](http://stackoverflow.com/questions/6114067/how-to-emulate-c-array-initialization-int-arr-e1-e2-e3-behaviour) and provided links in the beginning on why C-arrays still have their place in modern C++ (though I'm trying to bring down that last stronghold :P). – Xeo Jun 08 '11 at 01:13
@Xeo : Yes, you seem to have stumbled upon the _one_ valid use-case for C-arrays, haha. – ildjarn Jun 08 '11 at 01:35
@Oli: Not sure what your second point is, but `x`, `y`, and `z` are all in the same object, so pointer relationships are defined. – Ben Voigt Jun 08 '11 at 01:56
@BenVoigt : [citation needed] – ildjarn Jun 08 '11 at 02:02
@ildjarn: 5.9p2: "If two pointers point to non-static data members of the same object, or to subobjects or array elements of such members, recursively, the pointer to the later declared member compares greater provided the two members have the same access control (Clause 11) and provided their class is not a union." – Ben Voigt Jun 08 '11 at 02:07
@BenVoigt : Saying that one comes after the other is not the same as saying that one comes _immediately_ after the other. – ildjarn Jun 08 '11 at 02:49
@Ildjarn: I know. I was going to say pointer *arithmetic* was allowed, but I checked the standard in time to weaken my comment. However, because the types are the same, there is no padding, one subobject will immediately follow the other. – Ben Voigt Jun 08 '11 at 03:02
@BenVoigt : I'll have to think about your `sizeof` rationale regarding padding between objects of the same type tomorrow when I'm more awake. It sounds sane on the surface, but I'm not ready to commit to it just yet. :-P – ildjarn Jun 08 '11 at 03:12
@ildjarn: That's fair. Of course, it's just a note which says padding is only inserted for alignment reasons, which strictly speaking isn't a binding part of the standard. 5.3.3p2 may be helpful: " When applied to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array." Basically, I'm arguing that a type has just one alignment requirement, which applies equally to array elements and struct members. – Ben Voigt Jun 08 '11 at 03:15

Ben Voigt · Answer 2 · 2011-06-08T01:45:24.093

2

Yes. This class is ~~layout-compatible~~ standard-layout, because:

You have no virtual functions.
All data members are in a single access specifier block (the public:)

Because of this, it's guaranteed to be laid out sequentially just like a C structure. This is what allows you to read and write file headers as structures.

edited Jun 08 '11 at 01:45

answered Jun 07 '11 at 23:49

Ben Voigt

277,958
43
419
720

3

Is padding not a potential issue? I.e., I don't see what layout-compatibility has to do with whether `x`, `y`, and `z` are stored contiguously in memory. – ildjarn Jun 07 '11 at 23:56
Everything you've said sounds correct. But the OP's mechanism is still UB. – Oliver Charlesworth Jun 08 '11 at 00:00
Also, having looked at the standard, I'm not so sure that "layout-compatibility" is relevant here. – Oliver Charlesworth Jun 08 '11 at 00:27
Compilers are allowed to insert padding between two data members, but they're not allowed to re-order. A non-portable solution would be using a packing directive, otherwise, I'd go with [@Oli's solution](http://stackoverflow.com/questions/6272768/question-about-class-variable-declarations-in-c/6272781#6272781). – jweyrich Jun 08 '11 at 00:54
The question is, "is `struct { float x[3]; float * p;}` layout compatible with `struct { float x,y,z; float * p;}`. And I think nothing in that standard says that they are ( even though often the compiler will lay them out in the same way ). – Michael Anderson Jun 08 '11 at 00:54
1

@Michael : Every x64 compiler I've used will _not_ lay those out in the same way by default. That said, agreed that nothing in the standard says those are layout-compatible. – ildjarn Jun 08 '11 at 00:57
Oddly enough g++ 4.0.1 on OS X seems to lay them out the same way when using the `-arch x86_64` and with `-arch i386` (though of course the 64b pair is different to the 32b pair). – Michael Anderson Jun 08 '11 at 01:26
@jweyrich: I've never heard of padding being inserted between two data members of the same type, since alignment requirements are automatically satisfied. In fact, `sizeof (T)` is defined as the offset between elements of `T[]`. – Ben Voigt Jun 08 '11 at 01:44
@ildjarn, @Oli: My apologies for use of the wrong terminology. Now fixed. – Ben Voigt Jun 08 '11 at 01:46
@BenVoigt : Isn't 'standard-layout' a concept new to C++0x? I.e., isn't your answer C++0x-specific? In any case, I'm not clear on how that's at all relevant to the question. Both the C++03 standard and the C++0x FDIS specifically say "*Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other*" -- does that not allow a legal implementation to add padding between `x`, `y`, and `z` in the OP's code? – ildjarn Jun 08 '11 at 01:59
@ildjarn: I'd like to see an example of ANY compiler that doesn't lay those out the same way. By definition, the alignment requirement of an array is the same as the alignment of its element type. – Ben Voigt Jun 08 '11 at 02:01
@ildjarn: You can only have padding between subobjects of different types. The way sizeof is defined, in terms of array element spacing, guarantees that sizeof is a multiple of the alignment requirement. Therefore "implementation alignment requirements" couldn't require padding. And *standard-layout*, while it may be standardized in C++0x, is merely recognition of a de-facto requirement. – Ben Voigt Jun 08 '11 at 02:03
If your point is that your answer relates to expected behavior in practice, that's fine, but what I'm asking about is what is allowed by the standard, and I don't see any compelling evidence here that says the OP's code is legal in C++03. – ildjarn Jun 08 '11 at 02:09
@ildjarn: Care to quote the definition of POD from C++03? I don't have that version handy. I think having a non-trivial zero-argument constructor disqualifies this type, but without the definition I can't be sure. – Ben Voigt Jun 08 '11 at 02:14
Actually, this [has already been discussed](http://stackoverflow.com/questions/2226291/is-it-possible-to-create-and-initialize-an-array-of-values-using-template-metapro/2228298#2228298). – Ben Voigt Jun 08 '11 at 02:27
@BenVoigt : §3.9/10: "*Arithmetic types, enumeration types, pointer types, and pointer to member types, and cv-qualified versions of these types are collectively called scalar types. Scalar types, POD-struct types, POD-union types, arrays of such types and cv-qualified versions of these types are collectively called POD types.*" (cont'd) – ildjarn Jun 08 '11 at 03:02
@BenVoigt: (cont'd) §9/4: "*A POD-struct is an aggregate class that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-defined copy assignment operator and no user-defined destructor. Similarly, a POD-union is an aggregate union that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-defined copy assignment operator and no user-defined destructor. A POD class is a class that is either a POD-struct or a POD-union.*" – ildjarn Jun 08 '11 at 03:03
@ildjarn: Then the struct in the question is C++03 POD? I thought there were some restrictions on constructors as well... I guess the trivial default constructor requirement is new with C++0x? Oh wait, I think the trivial constructors requirement is part of *aggregate class*. – Ben Voigt Jun 08 '11 at 03:05
@BenVoigt : Sorry, missed one. §8.5.1/1: "*An aggregate is an array or a class with no user-declared constructors, no private or protected non-static data members, no base classes, and no virtual functions.*" So you're correct in thinking that any type with a user-defined constructor cannot be a POD type in C++03. – ildjarn Jun 08 '11 at 03:09
1

@ildjarn: Thanks. Yeah in C++03 any user-defined constructor exempted a type from the POD layout requirements, even there is no reason for a constructor to affect layout. I believe this is why POD was split into two classifications in C++0x: *trivial* and *standard-layout*. – Ben Voigt Jun 08 '11 at 03:13
@Ben: Even if we can assume that the compiler will lay them out equivalently, I believe that you will still get UB, due to aliasing-like considerations. – Oliver Charlesworth Jun 08 '11 at 10:44
@Oli: The array elements and struct members have the same type, so strict aliasing isn't an issue... for standard-layout types, accessing initial members with compatible layout is valid; the parent types don't have to be the same, see 9.2p19. Unfortunately the standard didn't come right out and say that an array is a standard-layout type. – Ben Voigt Jun 08 '11 at 13:17
@Ben: I mean "aliasing" in the more general sense. The problem is that the standard does not allow you to dereference beyond the end of an array (and a scalar is considered a length-one array in this respect); so the compiler is free to optimise based on that assumption. So code such as `blah.y = 1; blah.data[1] = 2; std::cout << blah.y` might output "1" depending on how the compiler arranges accesses. – Oliver Charlesworth Jun 08 '11 at 13:51
@Ben: I believe "layout-compatibility" applies to structs, not to fields within a struct. – Oliver Charlesworth Jun 08 '11 at 13:59
@Oli: Do the aliasing rules depend on "validly derived pointers"? Because `blah.data[1]` is `*(blah.data + 1)` and `blah.data + 1` has type `float*` and contains the address of `blah.y`, using such a pointer to access the variable causes no issues with aliasing. – Ben Voigt Jun 08 '11 at 14:44
1

@Ben: This is, by definition an alias (http://en.wikipedia.org/wiki/Aliasing_(computing)). The question is, is it one that the compiler has to consider when it's generating optimised code? My understanding is that it's not, because you're not supposed to dereference "out-of-bounds". – Oliver Charlesworth Jun 08 '11 at 15:02
@Oli: Yes it's an alias, but is it illegal aliasing? I don't believe so. `blah.data + 1` is a pointer to `blah.y` (assuming the standard note holds that padding is only for alignment purposes). Does aliasing require the pointer to be "validly derived", or does it only look at the type of pointers (and in C99, `restrict`)? – Ben Voigt Jun 08 '11 at 18:48
@Ben: As a starting point, the standard explicitly says that evaluating anything other than `blah.data+0` and `blah.data+1` is undefined, so that rules out, for instance accessing `blah.z` with this mechanism. As for whether it's valid to dereference `blah.data+1`, well the C99 standard explicitly prohibits it. I'm still looking for the equivalent wording in the C++ standard(s). See also: http://gcc.gnu.org/onlinedocs/libstdc++/manual/bk01pt08ch19s02.html. – Oliver Charlesworth Jun 08 '11 at 19:16
@Oli: section 5.7. But that only affects whether the resulting pointer is *safely-derived*. "It is implementation-defined whether an implementation has relaxed or strict pointer safety." – Ben Voigt Jun 08 '11 at 20:12
But it's 3.10p11 that defined when the compiler can assume no aliasing, and those are completely type-based. – Ben Voigt Jun 08 '11 at 20:16
@Ben: 5.7 explicitly says that, for instance, `blah.data+2` is undefined. There is no wording in 3.10/11 that overrides/overrules that; it simply says that anything not included in that list is undefined. The stuff on safely-derived pointers is a new idea in C++0x, and isn't really relevant; it was introduced to help with identifying reachability of dynamic memory for implementations that included garbage collection. – Oliver Charlesworth Jun 08 '11 at 21:26
@Oli: I guess you might have to first cast to a layout-compatible structure as suggested by Michael in one of the early comments, in order to make the pointer arithmetic defined. But the mere possibility of doing so effectively requires implementations to make `blah.data + 2` work. – Ben Voigt Jun 08 '11 at 22:43

Mr Fooz · Answer 3 · 2011-06-08T01:16:37.653

1

The compiler has some flexibility in how it lays out the memory within a struct. The struct will never overlap another data structure, but it can inject unused space between elements. In the struct you give, some compilers might choose to add 4 bytes of extra space between z and data so that the data pointer can be aligned. Most compilers provide a way of packing everything tightly.

EDIT: There's no guarantee that the compiler will choose to pack x, y, and z tightly, but in practice they will be packed well because they are the first elements of the struct and because they're a power of two in size.

edited Jun 08 '11 at 01:16

answered Jun 08 '11 at 00:12

Mr Fooz

109,094
6
73
101

1

The problem is not the gap between `z` and `data`, it's the gaps between `x`, `y` and `z`. There are other issues as well. – Oliver Charlesworth Jun 08 '11 at 00:36

score 1 · Answer 4 · answered Jun 08 '11 at 00:19

1

or you can have an operator[] overload

float operator[](int idx)
{
 switch (idx)
{
case 0:
  return x;
case 1:
  return y;
case 2:
 return z;
}
assert (false);
}

answered Jun 08 '11 at 00:19

pm100

48,078
23
82
145

2

That should be `float &`. And you'd need to provide a `const` version as well... – Oliver Charlesworth Jun 08 '11 at 00:20
Thanks, this a reasonable option. – milesleft Jun 08 '11 at 00:26

Michael Anderson · Answer 5 · 2011-06-08T01:52:55.527

1

Your solution is not valid, but if you can ensure (or know) that your compiler will "do the right thing" (in particular by controlling padding between the x, y and z elements) you will be ok. In this case though I'd remove the data member altogether and use operator[].

I've seen something like this used on occasion. It runs into exactly the same issues, but does save you storing that data pointer, and allows for a nicer v[0] syntax rather than v.data[0].

class Vector3D
{
    public:

    float x, y, z;
    float& operator[](int i) { return *(&x+i); }
    const float& operator[](int i) const { return *(&x+i); }

    Vector3D() : x(0.0), y(0.0), z(0.0) {}
}

EDIT: Prompted by ildjam heres a compliant version using accessors rather than members, that is similar.

class Vector3D
{
    public:
      float& operator[](int i) { return v[i]; }
      const float& operator[](int i) const { return v[i]; }

      float& x() { return v[0]; }
      float  x() const { return v[0]; }
      float& y() { return v[1]; }
      float  y() const { return v[1]; }
      float& z() { return v[2]; }
      float  z() const { return v[2]; }

      Vector3D() : v() {}
    private:    
      float v[3];
};

edited Jun 08 '11 at 01:52

answered Jun 08 '11 at 00:27

Michael Anderson

70,661
7
134
187

1

This suffers from the exact same UB as the OP's code -- there is no guarantee that `x`, `y`, and `z` are stored contiguously in memory. – ildjarn Jun 08 '11 at 00:34
Indeed, hence the "runs into exactly the same issues." – Michael Anderson Jun 08 '11 at 00:51
You posted an answer to demonstrate a nicer syntax for invoking UB? I.e., why post this instead of a valid, conformant implementation of `Vector3D` that demonstrates how to write an `operator[]`? – ildjarn Jun 08 '11 at 00:56
I'm posting something that I've seen used in production code, where performance, size, readability and conformance issues have been considered (and documented!). If you know what your compiler is doing with padding, or can control it using pragmas etc, then this solution may be appropriate. For this code accessing a Vector3D through `v[0]` and `v.x` produce the same machine code, and each Vector3D is not carrying around extra "cruft" that enlarges its size. Both of these are critical if you're using Vector3D inside core loops. – Michael Anderson Jun 08 '11 at 01:05
But you're right, thats not an answer to the explicit question. I'll update slightly. – Michael Anderson Jun 08 '11 at 01:06
That sort of code is the reason so many companies have such a nightmare trying to release 64-bit versions of their applications. ;-] It would be trivial to change this code such that it's conformant, performance and size remain identical, and readability isn't too much worse -- so why not post that? – ildjarn Jun 08 '11 at 01:09
I've put a similar version up. I have a feeling that there was a good (performance based) reason to go down the other route - and I think it may have been _old_ compilers (for embedded systems I think) not doing a good job on the default constructor - which killed some core loops - but thats probably no longer valid, especially with compilers less than about 5 years old. – Michael Anderson Jun 08 '11 at 01:38
Also note that `Vector3D() : v() { }` is shorthand for `Vector3D() : { v[0]=0; v[1]=0; v[2]=0; }`, and possibly more efficient. – ildjarn Jun 08 '11 at 01:44
Noted and updated - in fact the default constructor should do the right thing - but usually a Vector3 will have non-default constructors so its clearer to be explicit. – Michael Anderson Jun 08 '11 at 01:52
@Michael : No, the implicit default constructor definitely wouldn't do the right thing -- `float` and `float[3]` are POD types, so no automatic initialization is performed (whereas `v()` forces value-initialization). – ildjarn Jun 08 '11 at 02:06

score -1 · Answer 6 · answered Jun 07 '11 at 23:44

-1

Do something like this:

float data[3];
float& x, y, z;

    Vector3D() : x(data[0]), y (data[1]), z(data[2]) { data [0] = data [1] = data [2] = 0;}

answered Jun 07 '11 at 23:44

Rajivji

305
1
2

Inquiry about class variable declarations in C++

6 Answers6

Linked