argument packing in std::tuple<> and then applying

Question

I'm writing a binding engine for lua. It works by instantiating function tamplates, that gather arguments provided by lua into a std::tuple and then apply the std::tuple on a pointer to member function. Here's an example:

template <member_meta_info const* mmi, class C, class R, class ...A>
inline typename std::enable_if<std::is_same<void, R>::value, int>::type
member_stub(lua_State* L)
{
  assert(sizeof...(A) + 1 == lua_gettop(L));

  static std::tuple<A...> args;

  set_args<0, 2>(args, L);

  lua_getfield(L, -1, "__instance");
  assert(lua_islightuserdata(L, -1));

  typedef R (C::*ptr_to_member_type)(A...);

  apply_tuple(static_cast<C*>(lua_touserdata(L, -1)),
    *static_cast<ptr_to_member_type*>(mmi->ptr_to_member), args);

  lua_pushnil(L);

  return 1;
}

mmi->ptr_to_member is simply a void* pointer. The set_args trick was stolen from: iterate over tuple:

template<std::size_t I = 0, std::size_t O = 1, typename... Tp>
inline typename std::enable_if<I == sizeof...(Tp)>::type
set_args(std::tuple<Tp...>&, lua_State*)
{
}

template<std::size_t I = 0, std::size_t O = 1, typename... Tp>
inline typename std::enable_if<I != sizeof...(Tp)>::type
set_args(std::tuple<Tp...>& t, lua_State* L)
{
  set_arg(L, I + O, std::get<I>(t));

  set_args<I + 1, O, Tp...>(t, I);
}

set_arg() are a set of overloaded functions for various primitive types (such as int, double, ...) that set the reference, returned from std::get<>, for example:

inline void set_arg(lua_State* L, std::size_t i, double& value)
{
  assert(lua_isnumber(L, i));
  value = lua_tonumber(L, i);
}

The apply trick was adapted from: How do I expand a tuple into variadic template function's arguments?

#ifndef APPLYTUPLE_HPP
# define APPLYTUPLE_HPP

template<size_t N>
struct Apply {
template<typename F, typename T, typename... A>
    static inline auto apply(F&& f, T && t, A &&... a)
      -> decltype(Apply<N-1>::apply(::std::forward<F>(f), ::std::forward<T>(t),
        ::std::get<N-1>(::std::forward<T>(t)), ::std::forward<A>(a)...
      ))
    {
      return Apply<N-1>::apply(::std::forward<F>(f), ::std::forward<T>(t),
        ::std::get<N-1>(::std::forward<T>(t)), ::std::forward<A>(a)...
      );
    }

    template<typename C, typename F, typename T, typename... A>
    static inline auto apply(C && c, F && f, T && t, A &&... a)
      -> decltype(Apply<N-1>::apply(::std::forward<C>(c),
        ::std::forward<F>(f), ::std::forward<T>(t),
        ::std::get<N-1>(::std::forward<T>(t)), ::std::forward<A>(a)...
      ))
    {
      return Apply<N-1>::apply(::std::forward<C>(c), ::std::forward<F>(f),
        ::std::forward<T>(t), ::std::get<N-1>(::std::forward<T>(t)),
        ::std::forward<A>(a)...
      );
    }

    template<typename C, typename T, typename... A>
    static inline C* apply(T && t, A &&... a)
    {
      return Apply<N-1>::template apply<C>(::std::forward<T>(t),
        ::std::get<N-1>(::std::forward<T>(t)), ::std::forward<A>(a)...
      );
    }
};

template<>
struct Apply<0> {
  template<typename F, typename T, typename... A>
  static inline auto apply(F && f, T &&, A &&... a)
    ->decltype((*::std::forward<F>(f))(::std::forward<A>(a)...))
  {
    return (*::std::forward<F>(f))(::std::forward<A>(a)...);
  }

  template<typename C, typename F, typename T, typename... A>
  static inline auto apply(C && c, F && f, T &&, A &&... a)
    ->decltype((::std::forward<C>(c)->*::std::forward<F>(f))(::std::forward<A>(a)...))
  {
    return (::std::forward<C>(c)->*::std::forward<F>(f))(::std::forward<A>(a)...);
  }

  template<typename C, typename T, typename... A>
  static inline C* apply(T &&, A &&... a)
  {
    return new C(::std::forward<A>(a)...);
  }
};

template<typename F, typename T>
inline auto apply_tuple(F && f, T && t)
  ->decltype(Apply< ::std::tuple_size<
    typename ::std::decay<T>::type
  >::value>::apply(::std::forward<F>(f), ::std::forward<T>(t)))
{
  return Apply< ::std::tuple_size<
    typename ::std::decay<T>::type
  >::value>::apply(::std::forward<F>(f), ::std::forward<T>(t));
}

template<typename C, typename F, typename T>
inline auto apply_tuple(C && c, F && f, T && t)
  ->decltype(Apply< ::std::tuple_size<
    typename ::std::decay<T>::type
  >::value>::apply(::std::forward<C>(c), ::std::forward<F>(f), ::std::forward<T>(t)))
{
  return Apply< ::std::tuple_size<
    typename ::std::decay<T>::type
  >::value>::apply(::std::forward<C>(c), ::std::forward<F>(f), ::std::forward<T>(t));
}

template<typename C, typename T>
inline C* apply_tuple(T && t)
{
  return Apply< ::std::tuple_size<
    typename ::std::decay<T>::type
  >::value>::template apply<C>(::std::forward<T>(t));
}

#endif // APPLYTUPLE_HPP

it applies the tuple args on the pointed to member function. Now for my question. I'm disturbed that for each function call I copy all the arguments lua provides into a std::tuple and then apply it the on the pointer to member. Certainly the copying entails some overhead. Is it possible to omit the copying somehow? Does there exist a container (standard or otherwise) that is more suitable for the copying of arguments than std::tuple is (i.e. is less fat, more trimmed).

"Certainly the copying entails some overhead." -- Did you profile this? Is it actually a problem? Also, "`mmi->ptr_to_member` is simply a `void*` pointer" -- you're entering undefined behaviour land, a member function pointer can not necessarily be represented by a normal data pointer. Same goes for a normal function pointer. What you need is a dummy class and `reinterpret_cast` the member functions like `reinterpret_cast(&SomeClass::some_member)` -- this is guaranteed to round-trip if you cast it back to the correct type. — Xeo, Feb 26 '13 at 10:24
I cast it to a pointer to member pointer and dereference *static_cast(mmi->ptr_to_member). Is that undefined? There's double indirection. Otherwise I viewed the code under a debugger and don't like it too much. — user1095108, Feb 26 '13 at 10:28
What is undefined behaviour is casting a member pointer to `void*` in the first place. — Xeo, Feb 26 '13 at 10:29
But I don't, I do this: `*static_cast(mmi.ptr_to_member) = ptr_to_member;`. The `void*` pointer points to:`new char[sizeof(ptr_to_member)];` — user1095108, Feb 26 '13 at 10:31
I see, nevermind then. :) What still stands is - did you profile that this is actually a problem? Also, I'd hazard the guess that you can't do much against the copying, unless you get references back from the `lua_toXXX` functions. — Xeo, Feb 26 '13 at 10:38
@Xeo The problem is, that despite `-O3 -finline-functions`, I see a lot of functions that are not inlined. I also see the copying taking place. This is not a problem, since I am writing this as an exercise and scripting languages are slow as it is. I'm trying to learn something new. — user1095108, Feb 26 '13 at 10:58

Daniel Frey · Accepted Answer · 2013-02-28T08:23:27.420

I think using a std::tuple should be fine, but you might want to try to drop the static and construct it in one step only (no recursion). It could improve the generated code, but I'll leave it for you to measure and/or analyze it.

#include <cstddef>

// we need a compile-time helper to generate indices
template< std::size_t... Ns >
struct indices
{
  typedef indices< Ns..., sizeof...( Ns ) > next;
};

template< std::size_t N >
struct make_indices
{
  typedef typename make_indices< N - 1 >::type::next type;
};

template<>
struct make_indices< 0 >
{
  typedef indices<> type;
};

// instead of set_arg, provide get_arg:
template< typename T >
T get_arg( lua_State* L, std::size_t N )
{
  static_assert( sizeof( T ) == 0, "T not specialized" );
}

// one specialization per type
template<>
inline double get_arg< double >( lua_State* L, std::size_t i )
{
  assert(lua_isnumber(L, i));
  return lua_tonumber(L, i);
}

// etc.

// now that we have the helpers, we use it to create an impl
// and forward the original call to it.
template< typename... Args, std::size_t... Ns >
void member_stub_impl( lua_State* L, const indices< Ns... >& )
{
  // and here's the non-recursive direct initialization
  // version of a std::tuple< Args... >, adjust the offset (1) as needed
  const std::tuple< Args... > args( get_arg< Args >( L, Ns + 1 )... );

  // now use it...

}

// the forwarder providing the indices to the impl
template< typename... Args >
void member_stub( lua_State* L )
{
  typedef typename make_indices< sizeof...( Args ) >::type Indices;
  return member_stub_impl< Args... >( L, Indices() );
}

EDIT: Now that we've seen apply_tuple, I think you can actually get rid of both apply_tuple and the std::tuple itself. My above helpers in place, it boils down to:

template <member_meta_info const* mmi, class C, class R, class ...A, std::size_t ...Ns>
inline typename std::enable_if<std::is_same<void, R>::value, int>::type
member_stub_impl(lua_State* L, const indices<Ns...>& )
{
  assert(sizeof...(A) + 1 == lua_gettop(L));

  lua_getfield(L, -1, "__instance");
  assert(lua_islightuserdata(L, -1));

  typedef R (C::*ptr_to_member_type)(A...);
  C* obj = static_cast<C*>(lua_touserdata(L, -1));
  auto func = *static_cast<ptr_to_member_type*>(mmi->ptr_to_member);

  // look ma, no std::tuple! ;)
  obj->*func( get_arg< A >( L, Ns + 1 )... );

  lua_pushnil(L);

  return 1;
}

I couldn't test it so there might be some typo's in there, but I hope they are easy to fix. In case you need help, let me know.

This really helped quite a bit. The recursion trick is badly optimized apparently. — user1095108, Feb 26 '13 at 20:10
I was just thinking about it again and I have a feeling: I think `apply_tuple` could probably be integrated into `member_stub_impl`. Could you edit your question and add the implementation of `apply_tuple`, please? — Daniel Frey, Feb 28 '13 at 07:19
Some nice metal gymnastics going on in there. Works better now and executable is smaller. — user1095108, Feb 28 '13 at 12:32
Good to see it works for you. In general, I have the impression that too many people still think that, when working with variadic templates, recursion is a natural choice and they don't look for alternatives. From my experience, however, you can often avoid it which leads to much better code that is shorter (both source- and binary-wise), more efficient, readable and thus more maintainable. — Daniel Frey, Feb 28 '13 at 12:40

argument packing in std::tuple<> and then applying

1 Answers1