Implementing flexible array members with templates and base class

Question

In C99, you commonly see the following pattern:

struct Foo {
    int var1;
    int var2[];
};

Foo * f = malloc(sizeof(struct Foo) + sizeof(int)*n);
for (int i=0; i<n; ++i) {
    f->var2[i] = p;
}

But not only is this bad C++, it's also illegal.

You can achieve a similar effect in C++ like this:

struct FooBase {
    void dostuff();

    int var1;
    int var2[1];
};

template<size_t N>
struct Foo : public FooBase {
    int var2[N-1];
};

Although this will work (in the methods of FooBase you can access var2[2], var2[3], etc) it relies on Foo being standard layout, which isn't very pretty.

The benefit of this is that a non-templated function can receive any Foo* without conversion by taking a FooBase* and call methods that operate on var2, and the memory is all contiguous (which can be useful).

Is there a better way of achieving this (which is legal C++/C++11/C++14)?

I'm not interested in the two trivial solutions (including an extra pointer in the base class to the start of the array, and allocating the array on the heap).

That's one of the two trivial solutions, it puts the array on the heap. — jleahy, Jul 02 '13 at 11:47
That's the C++ way to do something that's different, but similar. — jleahy, Jul 02 '13 at 11:49
@jleahy: You shouldn't put too large-sized arrays on stack either. that means static arrays are not that flexible! — Nawaz, Jul 02 '13 at 11:50
Doesn't the C version put it on the heap too? The malloc allocates space for the array, doesn't it? — bennofs, Jul 02 '13 at 11:53
Why do you code it using `FooBase` and `Foo`, wouldn't templated `Foo` be enough? (with `var1` declared as well) — ondrejdee, Jul 02 '13 at 12:02
Why do you insist on `var2` being declared as `int[]`? Isn't it much cleaner to use just `int *`? — ondrejdee, Jul 02 '13 at 12:15
@ondav `var` being `int[]` implies that the storage for `var` is contiguous with the rest of the `Foo` structure. This matters less in this exact situation, because there is nothing there but the array and the size of the array: but in a more complex situation, `Foo` will contain a bunch of data, then it ends with a variable-length array. — Yakk - Adam Nevraumont, Jul 02 '13 at 12:35
Aha. (http://stackoverflow.com/questions/11733981/what-is-the-purpose-of-a-zero-length-array-in-a-struct) — ondrejdee, Jul 02 '13 at 14:01
As additional info and discussion, see this: [http://stackoverflow.com/questions/1887097/variable-length-arrays-in-c](http://stackoverflow.com/questions/1887097/variable-length-arrays-in-c "Variable length arrays in C++?") — ondrejdee, Jul 02 '13 at 14:09
"Although this will work" - I think it causes UB even if the class is standard layout; you exceed array bounds. — M.M, Nov 03 '14 at 20:39

Yakk - Adam Nevraumont · Answer 1 · 2013-07-02T13:53:50.030

What you want to do is possible, not not easy, in C++, and the interface to your struct is not a struct style interface.

Just like how a std::vector takes a block of memory and reformats it into something very much like an array, then overloads operators to make itself look array-like, you can do the same.

Access to your data will be via accessors. You'll manually construct your members in the buffer.

You might start with a list of pairs of "tags" and data types.

struct tag1_t {} tag1;
struct tag2_t {} tag2;
typedef std::tuple< std::pair< tag1_t, int >, std::pair<tag2_t, double> > header_t;

then, some more types that we'll interpret as saying "after the header part, we have an array". I'd want to massively improve this syntax, but the important part for now is to build up compile time lists:

struct arr_t {} arr;
std::tuple< header_t, std::pair< arr_t, std::string > > full_t;

You'd then have to write up some template mojo that figures out, given N at run time, how big a buffer you'd need to store the int and double followed by N copies of the std::string, everything properly aligned. This isn't easy.

Once you've done that, you'd also need to write code that constructs everything described above. If you wanted to get fancy, you'd even expose a perfect forwarding constructor and constructor wrappers allowing the objects to be constructed in a non-default state.

Finally, write up an interface that finds the memory offset of the constructed objects based on the tags I injected into the above tuples, reinterpret_casts the raw memory into a reference to the data type, and returns that reference (in both const and non-const versions).

For the array at the end, you'd return some temporary data structure that has overloaded operator[] which produces the references.

If you take a look at how std::vector turns blocks of memory into arrays, and mix that with how boost::mpl arranges tag-to-data maps, and then also mess manually arround with keeping things properly aligned, every step is challenging but not impossible. The messy syntax I've used here can also be improved (to some extent).

The end interface might be

Foo* my_data = Foo::Create(7);
my_data->get<tag1_t>(); // returns an int
my_data->get<tag2_t>(); // returns a double
my_data->get<arr_t>()[3]; // access to 3rd one

which could be improved with some overloading to:

Foo* my_data = Foo::Create(7);
int x = my_data^tag1; // returns an int
double y = my_data^tag2; // returns a double
std::string z = my_data^arr[3]; // access to 3rd std::string

but the effort involved would be reasonably large to get this far, and many of the things required would be pretty horrible.

Basically, in order to solve your problem as described, I would have to rebuild the entire C++/C structure-layout system manually within C++, and once you have done that it isn't hard to inject "arbitrary length array at the end". It would even be possible to inject arbitrary length arrays in the middle (but that would mean that finding the address of structure members past that array is a runtime problem: however, as our operator^ is allowed to run arbitrary code, and your structure can store the length of arrays, we are able to do this).

I cannot, however, think of a simpler, portable way to do what you ask within C++, where the data types stored do not have to be standard-layout.

I'm sure you meant `std::pair` instead of `std::pair`. (note : `arr_t` vs `arr`). — Nawaz, Jul 02 '13 at 13:06

Some programmer dude · Answer 2 · 2013-07-02T11:56:14.433

2

With a little typecasting, you can use the C pattern in C++ as well.

Just make the arrays initial size one, and allocate the structure pointer using new char[...]:

struct Foo {
    int var1;
    int var2[1];
};

Foo* foo_ptr = reinterpret_cast<Foo*>(new char[sizeof(Foo) + sizeof(int) * (n - 1)]);

Then you of course should cast it when freeing the structure as well:

delete[] reinterpret_cast<char*>(foo_ptr);

I don't really recommend this for general use though. The only acceptable (to me) place to use a scheme such as this is when transferring a structure somehow (network or files). And then I recommend marshaling it to/from a "proper" C++ object with a std::vector for the variable-length data.

edited Jul 02 '13 at 11:56

answered Jul 02 '13 at 11:49

Some programmer dude

400,186
35
402
621

Why not use `std::vector` instead? – Nawaz Jul 02 '13 at 11:52
@Nawaz Because then you can't use the structure to e.g. transfer it over a network connection in one go. – Some programmer dude Jul 02 '13 at 11:55
Oh then why not `foo.ptr = new int[N]`? I know the OP doesn't want it, but I don't know why he doesn't. – Nawaz Jul 02 '13 at 11:57
2

@Nawaz because then it wouldn't be one contiguous block of memory for the "top" of the `struct` and the variable length array. Start with `struct Foo { Type type; uint_64_t checksum; int length; int buff[]; }` and being able to allocate both the first, and second, part of the structure in one block of memory is nice. – Yakk - Adam Nevraumont Jul 02 '13 at 12:38
@Yakk: That depends on your allocation strategy, doesn't it? – Nawaz Jul 02 '13 at 12:44

tp1 · Answer 3 · 2013-07-02T13:07:28.683

What you want to do is not possible at all in C++. The reason is that sizeof(T) is compile-time constant, so placing an array inside a type makes it have compile-time size. So proper c++ way of doing it keeps the array outside of types. Note that placing array to stack is only possible if it's inside some type. So everything stack-based is limited to compile-time size of the array. (alloca might fix that). Your original C version also had similar problem, that types cannot deal with runtime sized arrays.

This is also the deal with variable-length arrays in C++. Not supported since it breaks sizeof and c++ classes rely on sizeof for data member access. Any solution that cannot be used together with c++ classes is no good. std::vector has no such problems.

Note that constexpr in c++11 makes offset calculation in your custom data types considerably simpler - the compile-time restriction is still there.

score 0 · Answer 4 · edited Nov 28 '16 at 20:49

0

I know I'm kinda late here, but my sugestion would be:

template<size_t N>
struct Foo {
    int var1;
    std::array<int,N> var2;
};

std::array stores the data as int v[N]; (not in the heap) so there would not be a problem converting it to streams of bytes

edited Nov 28 '16 at 20:49

Karol Gasienica

2,825
24
36

answered Nov 28 '16 at 19:12

Felipe Nardi Batista

197
10

hl037_ · Answer 5 · 2017-11-10T22:12:39.970

I'm also kinda late, but this solution is compatible with C's flexible arrays (if you play with preprocessor of course) :

#include <cstdlib>
#include <iostream>

using namespace std;

template <typename T>
class Flexible 
{
public:
   Flexible(){}
   ~Flexible(){}
   inline T & operator[](size_t ind){
      return reinterpret_cast<T*>(this)[ind];
   }
   inline Flexible<T> * getThis() { return this; }
   inline operator T * () { return reinterpret_cast<T*>(this); }
};

struct test{
   int a;
   Flexible<char> b;
};

int main(int argc, char * argv[]){
   cout << sizeof(test) << endl;
   test t;
   cout << &t << endl;
   cout << &t.a << endl;
   cout << &t.b << endl;
   cout << t.b.getThis() << endl;
   cout << (void*)t.b << endl;
   test * t2 = static_cast<test*>(malloc(sizeof(test) + 5));
   t2->b[0] = 'a';
   t2->b[1] = 'b';
   t2->b[2] = 0;
   cout << t2->b << endl;
   return 0;
}

(tested on GCC, and clang with clang++ -fsanitize=undefined, I see no reason it wouldn't be standard, except the reinterpret_cast part...)

NOTE : you won't get an error if it's not the last field of the struct. Be particulary cautious about using this in objects containing this struct as sub-sub-...-sub-member, because you could add unintentionally another field after and get some weird bugs. For example, I would not advise defining a struct/class with a member which itself contain a Flexible, such as this one :

class A{
  Flexible<char> a;
};

class B{
  A a;
};

Because it's easy to do this mistake after :

class B{
  A a;
  int i;
};

Implementing flexible array members with templates and base class

5 Answers5

Linked