3

Let's suppose I'm writing a Vector template class to represent points and vectors in an N-dimensional space. Something like the following:

template <typename T, int N>
struct Vector
{
    T data[N];
    // ...
};

Let's further assume that, for whatever reason, I want the user to be able to access data with meaningful names in the case of smaller vectors, e.g. by using v.xor v.y instead of v.data[0] and v.data[1].

I have two additional constraints.

  1. Access to the x or y component of a vector should not be written as a function call (e.g. it should be v.x, not v.x()).
  2. The following equality must hold sizeof(Vector<T, N>) == N * sizeof(T).

I looked at different possible approaches, including member variable references, tag dispatch and even CRTP but none of them satisfied all of my requirements.

Is it even possible to create such aliases? And if yes, how could one do it?

J-M. Gorius
  • 606
  • 4
  • 15
  • 1
    What's wrong with `v.x()`? – François Andrieux Jan 03 '19 at 18:49
  • 1
    Objects must be uniquely addressable, members never share an address. That means non-static data members, even stateless ones, have an effective non-zero impact on the class' size. So insisting on adding data member `(Vector::x)` and insisting on not increasing the size of your class seems contradictory to me. The exception to this are `union`s but even in that case only one member can be active at a given time. It's not appropriate for an alias. – François Andrieux Jan 03 '19 at 18:51
  • 1
    C++ doesn't have variable aliases. It does have references but those can/will come with a space cost. As a probable zero cost abstraction would could write `int& x = my_vector.data[0];` to make your own alias in the use site and that will probably be optimized away. – NathanOliver Jan 03 '19 at 18:52
  • @FrançoisAndrieux There is nothing wrong with `v.x()`. This question is merely a pretext to try to solve a problem that I have been fiddling around for quite some time. – J-M. Gorius Jan 03 '19 at 18:54
  • @FrançoisAndrieux Like NathanOliver pointed out, the remark about not increasing the size of the class is related to one of the approaches I mentioned, namely member references. If I add references to each cell of `data`, effectively giving them names, I will increase the overall size of `Vector`. – J-M. Gorius Jan 03 '19 at 18:56
  • @J-M.Gorius I understand the motivation for the requirement, but I'm pointing out that it seems contradictory to your other requirement. – François Andrieux Jan 03 '19 at 18:57
  • @NathanOliver This might be a solution, but it puts additional burden on the class' user. – J-M. Gorius Jan 03 '19 at 18:58
  • @J-M.Gorius The usual solution is to add member functions like `x()`. If you explained your objection to using that solution, it might be easier to find something that works for you. I don't believe there exists a solution that fits all of the requirements you've put forward. – François Andrieux Jan 03 '19 at 19:03
  • @FrançoisAndrieux The main reason behind the `v.x` requirement is API compatibility. The `Vector` type is intended to be a replacement for a set of classes (e.g. `Vector2`, `Vector3`, etc.) in order to avoid code duplication. But there is a significant amount of code already using `Vector2` and using `v.x` all over the place. – J-M. Gorius Jan 03 '19 at 19:07
  • @J-M.Gorius I don't see how you could achieve that without cost. Might be worth considering to instead stick with what you have, and add a member or free function that can get/set by index if you need indexing. This goes to show why encapsulation is worth while. Had the vector types used functions instead, even if they appeared useless at the time, it would make changing the back-end very easy. You never know what your requirements might be in the future. – François Andrieux Jan 03 '19 at 19:12
  • @J-M.Gorius You can specialize the class for different sizes. So `Vector` would just have an `x` member, `Vector` would have an `x` and `y` member instead of an array and so on until you want the generic case to just be an array. Doesn't make life much easier but they would at least all come from the same class template. – NathanOliver Jan 03 '19 at 19:14
  • @FrançoisAndrieux I will certainly go for something like this, but I'd like to see if someone finds out another solution. – J-M. Gorius Jan 03 '19 at 19:23
  • If you are not constrained to C++, can you use the D programming language? It has template support that can do what you are looking for. – Eljay Jan 03 '19 at 19:34
  • @Eljay It has to be done in C++. – J-M. Gorius Jan 03 '19 at 19:36
  • An interesting problem. The solution may end up being a fairly large header file in C++, with N being constrained by however deep you make the specializations, and lots of repeated boilerplate. Shouldn't impact efficiency. – Eljay Jan 03 '19 at 19:40
  • Why do you want the second constraint? This constraint is hard to guarantee among different compilers because of different padding and alignment policies. – xskxzr Jan 06 '19 at 04:48
  • @J-M.Gorius those kind of shenanigans can sometimes be solved using [X Macros](https://en.wikipedia.org/wiki/X_Macro). Don't have time to give a full answer, but in short, they let you have indexed access to named members, and from there it's not a long way to the solution (I think - haven't thought this through). – Eran Jan 08 '19 at 20:13

3 Answers3

2

(This is not an answer, it is a comment with a code example, which doesn't fit as a comment, and doesn't format well if it could be stuffed into a comment.)

Can you go the other direction, and express the vector as a bunch of fields, then map an index getter/setter to each of those fields?

Taking out the N template parameter to simplify the problem:

#include <iostream>
#include <stdexcept>

template <typename T>
struct Vector3
{
    T x;
    T y;
    T z;
    T operator[](int i) const
    {
        switch(i)
        {
            case 0:
                return x;
            case 1:
                return y;
            case 2:
                return z;
            default:
                throw std::out_of_range("out of range");
        }
    }
    T& operator[](int i)
    {
        switch(i)
        {
            case 0:
                return x;
            case 1:
                return y;
            case 2:
                return z;
            default:
                throw std::out_of_range("out of range");
        }
    }
};

int main()
{
    Vector3<float> v;
    v.x = 1.0f;
    v[1] = 2.0f;
    v.z = 3.0f;
    std::cout << v[0] << " " << v.y << " " << v[2] << '\n';
}
Eljay
  • 4,648
  • 3
  • 16
  • 27
  • The `N` template parameter is important, as it is the one responsible for my problem in the first place. I already have a fine-working `Vector3` class, but I want to factor the code and introduce arbitrary size vectors. – J-M. Gorius Jan 03 '19 at 19:31
  • Hmm, but each one of the named fields would require a partial specialization for each supported arbitrary size. Which sort of defeats the purpose, unless the purpose it to be able to say `Vector` rather than `Vector3`. – Eljay Jan 03 '19 at 19:33
  • 1
    The big challenge would be disabling the members based on `N`. Perhaps you could conditionally inherit from base types that provide `x`, `y` and `z`? – François Andrieux Jan 03 '19 at 19:34
  • @Eljay The goal is to have something like `using Vector3 = Vector` with `Vector` being a specialized version of `Vector` which would incorporate `x`, `y`, and `z` accessors. – J-M. Gorius Jan 03 '19 at 19:35
  • @FrançoisAndrieux This is akin to tag dispatch, but for member variables. Unfortunately, I did not find any way to apply those techniques to variables, only to member functions. – J-M. Gorius Jan 03 '19 at 19:37
  • @J-M.Gorius I don't see the relation between inheriting for your data members and tag dispatch. I mean to introduce a base type which you specialize and which strictly provides the data members. You can then inherit from it and implement your member functions in the common `Vector` class like this answer suggests. Though I don't see how you want to handle dimensions above 3 or 4, so it's difficult to provide an example. – François Andrieux Jan 03 '19 at 19:43
  • @FrançoisAndrieux I wrongly understood what you meant in the first place. Actually, the only dimensions for which there has to be a named member instead of just an indexing operator are 2, 3 and 4. – J-M. Gorius Jan 03 '19 at 19:45
  • Trying to come up with an example, it seems it boils down to simply implementing `Vector2`, `Vector3` and `Vector4` and providing a templated alias for them, which brings you back to square one. The only saving grave might be that it lets you get away with writing bare-bones implementations and then implementing common functionalities as free functions or a common type. – François Andrieux Jan 03 '19 at 19:53
  • @FrançoisAndrieux I just found [this GitHub repository](https://github.com/valentingalea/vml), which seems to partially answer my question on firts sight. I will have a look at it more precisely as soon as possible. – J-M. Gorius Jan 03 '19 at 19:59
  • @J-M.Gorius That solution actually just implements each of vec2, vec3, and vec4 separately and also relies on compiler extensions that support accessing inactive union members. Heres [vec3](https://github.com/valentingalea/vml/blob/master/vml/detail/vector_base.h#L151)'s base type, it's the same as having `Vector3` but it also relies on undefined behavior. If this solution is acceptable for you, then you might as well keep what you have now, and add a `Vector` that aliases each concrete vector type. – François Andrieux Jan 03 '19 at 20:02
  • @J-M.Gorius as Francois correctly points out, they are using exactly the solution which I proposed. Just keep in mind it is UB. – Georgi Gerganov Jan 03 '19 at 20:06
  • @FrançoisAndrieux actually, it turns out that this particular case is not UB. See here for more info - [link](https://stackoverflow.com/questions/34677343/accessing-same-type-inactive-member-in-unions). I updated my answer to reflect this. – Georgi Gerganov Jan 03 '19 at 20:48
0

If macros are allowed then it seems doable.

First attempt (nice but not perfect...)

int main() {
    Vector<int, 4> vec;
    vec[0] = 1; // same as: vec.t1 = 1;
    vec[1] = 2; // same as: vec.t2 = 2;
    vec[2] = 3; // same as: vec.t3 = 3;
    vec[3] = 4; // same as: vec.t4 = 4;
    std::cout << vec.t1 + vec.t2 + vec.t3 + vec.t4; // 10
}

To achieve the above:

#define VAR_NAME(num) t##num

#define DefineVector(num) \
    template<typename T> \
    struct Vector<T, num> : Vector<T, num-1> { \
        T VAR_NAME(num); \
        T& operator[](int index) { \
            if(index == num-1) return VAR_NAME(num); \
            return Vector<T, num-1>::operator[](index); \
        } \
    }

template<typename T, size_t N>
struct Vector;

template<typename T>
struct Vector<T, 1> {
    T t1;
    T& operator[](int index) {
        // in case index != 0 this is UB
        return t1;
    }
};

DefineVector(2);
DefineVector(3);
DefineVector(4);

// TODO:
// replace 3 declarations above with a single *DefineVectorsRecursively(4);*
// by using recursive macros
// see: https://stackoverflow.com/questions/12447557/can-we-have-recursive-macros
// leaving this as a further exercise...

http://coliru.stacked-crooked.com/a/42625e9c198e1e58

EDIT: Added operator[] to address the concern raised in comment.


Second attempt: with nicer field names

The OP requested the fields to have nicer names, like x, y, z.

This is a challenge. But macros come again to the rescue:

int main() {
    Vector<int, 3> vec;
    vec[0] = 1;
    vec[1] = 2;
    vec[2] = 3;
    std::cout << vec.x + vec.y + vec.z; // 6
}

With the following code:

#include <boost/preprocessor/variadic/size.hpp>

template<typename T, size_t DIMENSIONS>
struct Vector;

#define DefineVector(VAR, ...) \
    template<typename T> \
    struct Vector<T, BOOST_PP_VARIADIC_SIZE(__VA_ARGS__) + 1> \
      : Vector<T, BOOST_PP_VARIADIC_SIZE(__VA_ARGS__)> { \
        T VAR; \
        T& operator[](int index) { \
            if(index == BOOST_PP_VARIADIC_SIZE(__VA_ARGS__)) return VAR; \
            return Vector<T, BOOST_PP_VARIADIC_SIZE(__VA_ARGS__)>::operator[](index); \
        } \
    }

#define DefineVector1(VAR) \
    template<typename T> \
    struct Vector<T, 1> { \
        T VAR; \
        T& operator[](int index) { \
            /* in case index != 0 this is UB */ \
            return VAR; \
        } \
    }

DefineVector1(x);
DefineVector(y, x);
DefineVector(z, y, x);
// TODO: create recursive macro for DefineVector(z, y, x)
// that will create the two above recursively

Code: http://coliru.stacked-crooked.com/a/2550eede71dc9b5e


But wait, operator[] is not so efficient

I had a thought of having a more efficient operator[] in case T is a standard layout type, with the following implementation:

#define DefineVector(VAR, ...) \
    template<typename T> \
    struct Vector<T, BOOST_PP_VARIADIC_SIZE(__VA_ARGS__) + 1> \
      : Vector<T, BOOST_PP_VARIADIC_SIZE(__VA_ARGS__)> { \
        T VAR; \
    T& operator[](int index) { \
        if constexpr(std::is_standard_layout_v<T>) { \
            return *(&VAR - (BOOST_PP_VARIADIC_SIZE(__VA_ARGS__) - index)); \
        } else { \
            if(index == BOOST_PP_VARIADIC_SIZE(__VA_ARGS__)) return VAR; \
            return Vector<T, BOOST_PP_VARIADIC_SIZE(__VA_ARGS__)>::operator[](index); \
        } \
    } \
}

The (BAD) attempt: http://coliru.stacked-crooked.com/a/d367e770f107995f

Unfortunately - above optimization is illegal

For the presented implementation, any Vector with DIMENSIONS > 1 is not a standard layout class (even if T is) because it has members both in base class and in the derived class:

10.1 [class.prop]

[3] A class S is a standard-layout class if it: ...

[3.6] has all non-static data members and bit-fields in the class and its base classes first declared in the same class ...

So the optimization attempt above has undefined behavior - compiler is not obliged to keep the address of the members in the inheritance hierarchy in their order.

The initial solution is still valid.

Amir Kirsh
  • 12,564
  • 41
  • 74
  • 1
    This way the `T data` array is missing. OP wants to access it through aliases: "I want the user to be able to access data with meaningful names in the case of smaller vectors, e.g. by using v.x or v.y instead of v.data[0] and v.data[1]". Otherwise, they can just declare `... Vector { T x, y, z; };` – Georgi Gerganov Jan 06 '19 at 07:25
  • added operator [] – Amir Kirsh Jan 16 '19 at 13:35
-1

Here is possible solution (although I think this is bad practice, and not really sure if portable):

template <typename T, int N>
union Vector
{
    struct { T x, y, z; };
    T data[N];
};

Here is example what happens:

int main() {
    Vector<int, 10> vec;
    vec.x = 100;
    vec.y = 200;
    vec.z = 300;
    vec.data[3] = vec.data[2] + 100;

    printf("%d %d %d %d\n", vec.data[0], vec.data[1], vec.data[2], vec.data[3]);
    printf("size = %d\n", (int) sizeof(vec));

    return 0;
}

Output:
    100 200 300 400
    size = 40

Update: and to make this well defined you can do:

template <typename T, int N> union Vector;

template <typename T> union Vector<T, 1> {
    struct { T x; };
    T data[1];
};

template <typename T> union Vector<T, 2> {
    struct { T x, y; };
    T data[2];
};

template <typename T> union Vector<T, 3> {
    struct { T x, y, z; };
    T data[3];
};

template <typename T> union Vector<T, 4> {
    struct { T x, y, z, w; };
    T data[4];
};

Just make sure the struct is standard-layout (i.e. this works for T = int, float, double, etc.).


Update 2: note that the above might still be UB, because T x, y, z and T data[3] seem to not actually be layout-compatible (see here). Still, this pattern seems to be used in various libraries, for implementing simple vector types - example1 (GLM), example2 video, example3

Georgi Gerganov
  • 1,006
  • 9
  • 17
  • It's undefined behavior to read from a union member if it wasn't the last one to be assigned to. This is not a solution. – François Andrieux Jan 03 '19 at 18:59
  • This is UB for all non trivial types. – NathanOliver Jan 03 '19 at 18:59
  • Moreover, this solution would make `z` accessible even for `Vector` (?). – J-M. Gorius Jan 03 '19 at 19:01
  • @NathanOliver As far as I understand it, it's UB for all types. Can you link to a reference that explains why it's ok for trivial types? – François Andrieux Jan 03 '19 at 19:02
  • @FrançoisAndrieux If I'm reading [this](https://timsong-cpp.github.io/cppwp/class.union#5) correctly the assignment will end the lifetime of anonymous struct and start the lifetime for the array (as long as it is trivial) so the only UB is actually the `vec.data[2] + 100` since `vec.data[2]` has an indeterminate value. So it is UB in all cases, just not for the member switching reason. – NathanOliver Jan 03 '19 at 19:11
  • @J-M.Gorius you can enable the template only for N > 2 using std::enable_if and probably specialise it for N <= 2 – Georgi Gerganov Jan 03 '19 at 19:38
  • @J-M.Gorius here is another relatively popular library (GLM) using my proposed possible solution [link](https://github.com/g-truc/glm/blob/master/glm/detail/type_vec3.hpp#L47) – Georgi Gerganov Jan 03 '19 at 20:25
  • @NathanOliver I updated my answer to address your comment. – Georgi Gerganov Jan 03 '19 at 20:52
  • I'm not sure if `T x; T y;` is layout-compatible with `T data[2];` but I'll take a look. – François Andrieux Jan 03 '19 at 21:09
  • I can't find anything that says two `T`s is layout-compatible with a `T[2]`. Though I wouldn't be surprised if it was the case, until someone can show that it's true, I'll have to assume that this is still UB. – François Andrieux Jan 03 '19 at 21:15
  • 1
    @GeorgiGerganov This seems to be the common pattern used by all small math libraries providing vector types. – J-M. Gorius Jan 03 '19 at 21:16
  • Note that in the linked question, the two `struct` used in the `enum` are exactly the same, except for their names and the names of their members. – François Andrieux Jan 03 '19 at 21:16
  • Could you give an example of famous library that uses this pattern? As far as I know, at least Eigen does not use this pattern. – xskxzr Jan 06 '19 at 04:58
  • @xskxzr one could argue that GLM is famous. But anyway, couldn't find any other big/famous library using this trick. I found a CppCon 2018 talk where the presenter claims this union approach is completely safe for trivial types. The active member issue from the standard should not be a problem in real world. [Here is the talk](https://youtu.be/8FoAxasNssA?t=1317) – Georgi Gerganov Jan 06 '19 at 06:58