2

I'm writing a custom multi layer perceptron (MLP) implementation in C++. All but the last layer share one activation function foo, with the final layer having a separate activation bar. I'm trying to write my code such that it's able to handle models of this type with a varying number of layers, like at this Godbolt link, reproduced below. Unfortunately, as written, I've had to hardcode the parameter pack for activation functions, and thus the code in the link only compiles for N = 5.

Is there a way to create a custom parameter pack from the two activation functions which is able to "left-extend" the first argument, such that I can then compile the code above (after suitably updating the call to computeIndexedLayers in computeMlp? Specifically, I'm thinking of some utility which can yield parameter packs like:

template <size_t N, typename ActivationMid, typename ActivationLast>
struct makeActivationSequence {}; // What goes here?

makeActivationSequence<0>(foo, bar) -> []
makeActivationSequence<1>(foo, bar) -> [bar]
makeActivationSequence<2>(foo, bar) -> [foo, bar]
makeActivationSequence<3>(foo, bar) -> [foo, foo, bar]
makeActivationSequence<4>(foo, bar) -> [foo, foo, foo, bar]
...

Looking at the details of std::index_sequence I believe something similar might work here, but it's unclear to me how I'd modify that approach to work with the two different types.

Please note also that I'm specifically limited to C++14 here due to some toolchain issues, so solutions that take advantage of e.g. if constexpr (as in the linked std::index_sequence details) won't work.

Code from the above Godbolt link, reproduced below for completeness:

#include <cstddef>
#include <utility>
#include <cstdio>

template <size_t LayerIndex, typename Activation>
 void computeIndexedLayer(
    const Activation& activation) {
        printf("Doing work for layer %zu, activated output %zu\n", LayerIndex, activation(LayerIndex));
    }

template <
    std::size_t... index,
    typename... Activation>
    void computeIndexedLayers(
    std::index_sequence<index...>, // has to come before Activation..., otherwise it'll get eaten
    Activation&&... activation) {
  (void)std::initializer_list<int>{
      (computeIndexedLayer<index + 1>(
           std::forward<Activation>(activation)),
       0)...};
}

template <size_t N, typename ActivationMid, typename ActivationLast>
void computeMlp(ActivationMid&& mid, ActivationLast&& last) {
  computeIndexedLayers(std::make_index_sequence<N>(),
    std::forward<ActivationMid>(mid),
    std::forward<ActivationMid>(mid),
    std::forward<ActivationMid>(mid),
    std::forward<ActivationMid>(mid),
    std::forward<ActivationLast>(last)
    );
}

int main() {
    computeMlp<5>([](const auto& x){ return x + 1;}, [](const auto& x){ return x * 1000;});

    // Doesn't compile with any other choice of N due to mismatched pack lengths
    // computeMlp<4>([](const auto& x){ return x + 1;}, [](const auto& x){ return x * 1000;});
}
Vasu
  • 1,090
  • 3
  • 18
  • 35

1 Answers1

2

You can't return a parameter pack from functions, so makeActivationSequence as you described is impossible. However, you can pass mid and last directly to computeIndexedLayers, and there utilise pack unfolding pairing them with, respectively, midIndex template parameter pack and lastIndex template parameter (in this case, there's exactly one lastIndex, so it's not a template parameter pack, but it's not hard to change/generalise if needed) deduced from two corresponding std::index_sequence arguments. Like this:

#include <cstddef>
#include <utility>
#include <cstdio>

template <size_t LayerIndex, typename Activation>
void computeIndexedLayer(Activation&& activation) {
    printf("Doing work for layer %zu, activated output %zu\n", LayerIndex, activation(LayerIndex));
}

template <std::size_t... midIndex, std::size_t lastIndex,
    typename ActivationMid, typename ActivationLast>
void computeIndexedLayers(
    std::index_sequence<midIndex...> midIdxs,
    std::index_sequence<lastIndex> lastIdxs,
    ActivationMid&& mid, ActivationLast&& last) {
    (void)std::initializer_list<int>{
        (computeIndexedLayer<midIndex + 1>(mid), 0)...,
        (computeIndexedLayer<lastIndex>(std::forward<ActivationLast>(last)), 0)};
}

template <size_t N, typename ActivationMid, typename ActivationLast>
void computeMlp(ActivationMid&& mid, ActivationLast&& last) {
    computeIndexedLayers(std::make_index_sequence<N - 1>(), std::index_sequence<N>{},
        std::forward<ActivationMid>(mid), std::forward<ActivationLast>(last));
}

int main() {
    computeMlp<6>([](const auto& x){ return x + 1;}, [](const auto& x){ return x * 1000;});
}

Godbolt link

Also note that in computeMlp both mid and last are forwarded, but at computeIndexedLayers only last is. It's done to avoid potential repeated move from mid, which could cause troubles if ActivationMid contains some state and is not a trivially movable type.

C++17

Since C++17 supports fold expressions, pretty ugly std::initializer_list hack in computeIndexedLayers can be replaced:

template <std::size_t... midIndex, std::size_t lastIndex,
    typename ActivationMid, typename ActivationLast>
void computeIndexedLayers(
    std::index_sequence<midIndex...> midIdxs,
    std::index_sequence<lastIndex> lastIdxs,
    ActivationMid&& mid, ActivationLast&& last) {
    (computeIndexedLayer<midIndex + 1>(mid), ...);
    computeIndexedLayer<lastIndex>(std::forward<ActivationLast>(last));
}

C++20

Templated lambdas in C++20 let us get rid of computeIndexedLayers altogether and deduce template parameters and parameter packs for lambda, defined and immediately invoked within computeMlp:

template <size_t N, typename ActivationMid, typename ActivationLast>
void computeMlp(ActivationMid&& mid, ActivationLast&& last) {
    [&]<std::size_t... midIndex, std::size_t lastIndex>(
        std::index_sequence<midIndex...> midIdxs,
        std::index_sequence<lastIndex> lastIdxs){
            (computeIndexedLayer<midIndex + 1>(mid), ...);
            computeIndexedLayer<lastIndex>(std::forward<ActivationLast>(last));
        }(std::make_index_sequence<N - 1>(), std::index_sequence<N>{});
}
YurkoFlisk
  • 873
  • 9
  • 21
  • How does work? I thought variadic arguments/parameter packs could only be the last argument. – Zebrafish Jun 11 '22 at 18:42
  • @Zebrafish Template arguments are deduced using template argument deduction rules from the invocation of `computeIndexedLayers`, so what's important here is not their order within template declaration (we don't specify them at all, yet alone in that order), but how they are used to form function paramaters and whether deduction patterns allow deduction provided arguments given. In particular, see [on cppreference](https://en.cppreference.com/w/cpp/language/template_argument_deduction) point 8 under Non-deduced contexts. – YurkoFlisk Jun 11 '22 at 19:22
  • 1
    Thanks, this works! For completeness, I'll note that the code as-is doesn't work when N is 0, i.e. the call to `ActivationLast` always happens. In this example, that's fine, but in my full code I needed that to disappear as well, so I used an `index_range` from [here](https://stackoverflow.com/questions/40617854/implement-c-template-for-generating-an-index-sequence-with-a-given-range) and converted `lastIndex` to a pack as well, which allows it to be potentially empty. That, plus a bunch of clamping, allows correct scaling down to 0 layers. – Vasu Jun 11 '22 at 22:24
  • 1
    @Vasu Glad I helped :) And good catch about `N == 0`. BTW, from C++17 onwards it also can be really nicely handled as special case by `if constexpr` in `computeMlp`, and in C++14 - by SFINAE using `std::enable_if_t` return type for 2 overloads. – YurkoFlisk Jun 11 '22 at 23:13