Handing type-erased data at runtime - how not to reinvent the wheel?

Question

I'm working on some code which gets data that looks like this:

enum data_type { INT16 = 0, INT32, UINT64, FLOAT, TIMESTAMP };
struct buffer {
    data_type element_type;
    size_t    size; // in elements of element_type, not bytes
    void*     data;
}

(this is simplified; in actuality there are quite a few more types, more fields in this struct etc.)

Now, I find myself writing a bunch of utility code to "convert" enum values to actual types and vice-versa, at compile time. Then I realize I need to do some of that I need to do the same at run-time as well, and with a variable number of buffers... so now, in addition to type-traits-based lookup of values and enum-template-parameter-based lookup of types - I'm writing code which looks up std::type_infos. It's kind of a mess.

But really - I should not be doing this. It's repetitive and I am absolutely sure I'm reinventing the wheel - implementing something which has already been written many times already: Compilers, DBMSes, data file parsers, serialization libraries and so on.

What can I do to minimize my wasted effort on this endeavor?

Notes:

I get these buffers at run time, and cannot just un-erase the type at compile time (e.g. using a type_traits).
I can't change the API. Or rather, I could change whatever I wanted in my code, but I still get data in this layout in memory.
I don't just take such buffers as input, I also need to produce them as output.
I occasionally need to handle many buffers of different at once - even a variable number of them (e.g. foo(buffer* buffers, int num_buffers);.
C++11 solutions are preferred over newer-standard-version ones.
I actually use gsl a lot, so you can use it in your answers if you like. As for Boost - that may be politically difficult to depend on, but for the purposes of a StackOverflow question, it's fine, I guess.

If you can use c++17, you might want to look at std::variant — Chemistree, Oct 31 '18 at 10:09
Yeah, I was gonna point out this can be reduced to a variant of vectors. Unless I'm missing something. — StoryTeller - Unslander Monica, Oct 31 '18 at 10:10
@Chemistree: This doesn't help, because I don't get to change the API (see edit). — einpoklum, Oct 31 '18 at 10:33
A possibly-related old question of mine, not about variants: [Idiom for simulating run-time numeric template parameters?](https://stackoverflow.com/questions/38914655/idiom-for-simulating-run-time-numeric-template-parameters) — einpoklum, Nov 07 '18 at 00:32

score 4 · Answer 1 · answered Oct 31 '18 at 10:54

4

The goal here should be to get back into the C++ type system as fast as possible. To do this, there should be one central function that switches based on the (runtime) data_type and then hands off each case to a (compile-time) template version.

You have not indicated how the associated functions look like, but here is an example:

template<typename T>
struct TypedBuffer
{
  TypedBuffer(void* data, size_t elementCount) { /* ... */ }
  // ...
};

template<typename T>
void handleBufferTyped(void* data, size_t elementCount)
{
  TypedBuffer<T> buf(data, elementCount);
  // Do whatever you want - you're back in the type system.
}

void handleBuffer(buffer buf)
{
  switch (buf.element_type)
  {
  case INT16:     handleBufferTyped<int16_t>(buf.data, buf.size); break;
  case INT32:     handleBufferTyped<int32_t>(buf.data, buf.size); break;
  case UINT64:    handleBufferTyped<uint64_t>(buf.data, buf.size); break;
  case FLOAT:     handleBufferTyped<float>(buf.data, buf.size); break;
  case TIMESTAMP: handleBufferTyped<std::time_t>(buf.data, buf.size); break;
  }
}

If needed, you can also have TypedBuffer inherit from a non-templated base class so you can return from handleBuffer polymorphically, but that's mixing a lot of paradigms and probably unnecessary.

answered Oct 31 '18 at 10:54

Max Langhof

23,383
5
39
72

1. I was asking about how to avoid writing this kind of stuff... 2. I also need to translate back. 3. The actual buffers are more complicated than that, and a single switch won't do. 4. This works for a single buffer. But I have multiple functions which take several buffers, and if that's not enough - functions which take a variable number of buffers: `void foo(buffer* buffers, int num_buffers);`. – einpoklum Oct 31 '18 at 11:11
@einpoklum 1. Getting back into the type system requires exactly this kind of stuff. 2. With what you asked, it's trivial to have each `TypedBuffer` have a conversion function to `buffer`. We can't give you good code/abstractions for that if we don't know what is different/same between those though. 3. We can't give you code for an unspecific "it's more complicated". 4. Loop over the buffers and treat each one as above. Use polymorphy (as hinted in the answer) if you have to (e.g. for storing the resulting typed buffers in the same container). – Max Langhof Oct 31 '18 at 11:58
@einpoklum More generally: If your API is fully outside the C++ type system (as you show) then the only way to get back into C++'s type system is to bridge that gap in each API function somehow. The way to do this with minimal effort depends on how your API looks like. For example, you could have a type <-> number mapping somewhere and loop over that, or a variety of other things. It would be a lot easier to discuss this if the question had representative examples of it. It's probably too broad otherwise. – Max Langhof Oct 31 '18 at 12:02

Passer By · Answer 2 · 2018-10-31T12:52:01.877

how not to reinvent the wheel?

Simply, use std::variant along with conversions back and forth. It's in the standard library for a reason.

On to reinventing the wheel, visiting is the simplest generic mechanism to handle type-erased data

enum data_type { INT16 = 0, INT32, UINT64, FLOAT, TIMESTAMP, size };

template<data_type d>
struct data
{
    using type = void;
};
template<>
struct data<INT16>
{
    using type = int16_t;
};
// and so on

template<data_type d>
using data_t = typename data<d>::type;


template<typename F, typename T>
void indirect(void* f, void* t, int n)
{
    (*(F*)f)((T*)t, n);
}

template<typename F, size_t... Is>
void visit_(F&& f, buffer* bufs, int n, std::index_sequence<Is...>)
{
    using rF = typename std::remove_reference<F>::type;
    using f_t = void(*)(void*, void*, int);
    static constexpr f_t fs[] = {indirect<rF, data_t<data_type(Is)>>...};
    for(int i = 0; i < n; i++)
        fs[bufs[i].element_type](&f, bufs[i].data, bufs[i].size);
}

template<typename F>
void visit(F&& f, buffer* bufs, int n)
{
    visit_(std::forward<F>(f), bufs, n, std::make_index_sequence<data_type::size>{});
}

std::index_sequence and friends can be implemented relatively easily in C++11. Use as

struct printer
{
    template<typename T>
    void operator()(T* t, int n)
    {
        for(int i = 0; i < n; i++)
            std::cout << t[i] << ' ';
        std::cout << '\n';
    }
};

void foo()
{
    visit(printer{}, nullptr, 0);
}

Matthieu Brucher · Answer 3 · 2018-10-31T10:56:00.000

1

This seems to be what type_traits are used for (https://en.cppreference.com/w/cpp/types).

Basically, you define a templated structure, by default it's empty, and you specialize it for each enum you have. Then in your code you use MyTypeTraits<MyEnumValue>::type to get the type associated to the enum you want.

And everything is defined at compile time. If you need runtime information, you can always do some dispatch based on the value of the template (for instance if you store the enum as well).

edited Oct 31 '18 at 10:56

answered Oct 31 '18 at 10:06

Matthieu Brucher

21,634
7
38
62

1

In order to use type traits, you need to know the enum value at compile time. I only know it at run ti.e – einpoklum Oct 31 '18 at 10:28

score 1 · Answer 4 · answered Oct 31 '18 at 18:58

Use boost::variant and gsl::span.

enum data_type { INT16 = 0, INT32, UINT64, FLOAT, TIMESTAMP };
struct buffer {
  data_type element_type;
  size_t    size; // in elements of element_type, not bytes
  void*     data;
};

template<class...Ts>
using var_span = boost::variant< gsl::span< Ts > ... >;

using buffer_span = var_span< std::int16_t, std::int32_t, std::uint64_t, float, ??? >;

buffer_span to_span( buffer buff ) {
  switch (buff.element_type) {
    case INT16: return gsl::span<std::int16_t>( (std::int16_t*)buff.data, buff.size );
    // etc
  }
}

now you can

auto span = to_span( buff );

and then visit the span to type-safe access the buffer of data.

Writing visitors is less painful in c++14 due to [](auto&&) lambdas, but doable in c++11.

Writing template<class...Fs> struct overloaded can also make it easier to write visitors. There are a myriad of implementations out there.

If you cannot use boost you can convert to_span to visit_span and have it take a visitor.

If you cannot use gsl, writing your own span is trivial.

visit_span( buff, overload(
  [](span<int16_t> span) { /* code */ },
  [](span<int32_t> span) { /* code */ },
  // ...
 ));

or

 struct do_foo {
   template<class T>
   void operator()(span<T> span) { /* code */ }
 };
 visit_span( buff, do_foo{captures} );

Handing type-erased data at runtime - how not to reinvent the wheel?

4 Answers4