Transform-and-Accumulate

Question

Have anybody written a C++ STL-compliant algorithm that combines std::transform and std::accumulate into a single pass algorithm supporting both the unary, binary and perhaps even (n-ary!) variant, say std::transformed_accumulate? I want this because I have found this pattern highly reusable in for example linear algebra for example in (l1-)norm calculations. The l1-norm calculates the sum of the absolute values of the elements.

If I am not wrong, you are asking for mapreduce. – Manoj R Nov 19 '12 at 11:51 — Manoj R, Nov 19 '12 at 11:51

David Rodríguez - dribeas · Answer 1 · 2012-05-14T23:29:23.080

Uhm... My bet is that you can do that by embedding your transformation into the binary predicate, tranform the element and accumulate after the transformation.

struct times2accumulator {
   int operator()( int oldvalue, int newvalue ) const {
      return oldvalue + 2*newvalue;
   }
};
int r = std::accumulate( v.begin(), v.end(), 2, times2accumulator() );

That functor would be equivalent to:

struct times2 {
   int operator()( int x ) {
      return 2*x;
   }
};
std::vector<int> tmp; tmp.reserve( v.size() );
std::transform( v.begin(), v.end(), std::back_inserter(tmp), times2 );
int r = std::accumulate( tmp.begin(), tmp.end(), 0 );

Of course this could be made generic, just pass the transformation functor to a generic base functor:

template <typename Transform>
struct transform_accumulator_t {
    Transform t;
    transform_accumulator_t( Transform t ) : t(t) {}
    int operator()( int oldvalue, int newvalue ) const {
        return oldvalue + t(newvalue);
    }
};
// syntactic sugar:
template <typename T>
transform_accumulator_t<T> transform_accumulator( T t ) {
    return transform_accumulator_t<T>(t);
}
int r = std::accumulate(v.begin(), v.end(), 0, transform_accumulator(times2));

And you could also generalize on the type in the container... or even create a more generic transform_accumulator that takes both an accumulator and a transformation functors and applies them in order. Actual implementation left as an exercise for the reader.

Clever... Can we perhaps simplify this using C++11 lambda expressions? — Nordlöw, May 14 '12 at 23:28

Jerry Coffin · Answer 2 · 2012-05-16T01:29:41.583

Although it may not exactly fit the original intent, std::inner_product is basically your binary version. You pass it an initial value, two ranges, and two functors, and it applies them as:

T acc = initial_value;
while (begin1 != end1) {
    acc = binary_op1(acc, binary_op2(begin1, begin2);
    ++begin1;
    ++begin2;
return acc;

So, for your L1 you'd do something on this general order:

norm = std::inner_product(input1.begin(), input1.end(), 
                          input2.begin(), input2.end(), 
                          std::plus<int>(), std::abs);

Only that doesn't quite work -- right now, it's trying to pass std::abs where you really need a binary function that combines the two inputs, but I'm not sure how the two inputs are really supposed to be combined.

std::partial_sum is fairly close to your unary version, except that along with accumulating a result, it (attempts to) write out each intermediate result, not just the final result. To just get the final result, you'd have to write (and pass an instance of) a kind of do-nothing iterator that just holds a single value:

template<class T, class Dist=size_t, class Ptr = T*, class Ref = T&>
class unique_it : public std::iterator<std::random_access_iterator_tag, T, Dist, Ptr, Ref> { 
   T &value;
public:
   unique_it(T &v) : value(v) {}
   T &operator*() { return value; }
   unique_it &operator++() { return *this; }
   unique_it &operator+(size_t) { return *this; }
   unique_it &operator++(int) { return *this; }
};

template <class T>
unique_it<T> make_res(T &v) { return unique_it<T>(v); }

With this, your L1 normalization would look something like this:

int main(){ 
    double result=0.0;
    double inputs[] = {1, -2, 3, -4, 5, -6};

    std::partial_sum(
        inputs, inputs+6, 
        make_res(result),
        [](double acc, double v) {return acc + std::abs(v);});

    std::cout << result << "\t";
    return 0;
}

Thanks even more for relevant references to other reusable STL algorithms. — Nordlöw, May 15 '12 at 11:10
If lambdas are available should I prefer using, say, `[](T x, T y) { return x+y; }` over `std::plus`? — Nordlöw, May 17 '12 at 21:56
@Nordlöw: I most go with whatever's simplest. In this case, that seems to be `plus`. The exception would be if you're using a *lot* of lambdas anyway, in which case mixing in a few pre-built functors could lead to confusion. — Jerry Coffin, May 17 '12 at 21:59

score 2 · Answer 3 · answered Jul 11 '18 at 03:06

2

As of C++17 there is also std::transform_reduce, which also has the benefit of being parallelizable.

https://en.cppreference.com/w/cpp/algorithm/transform_reduce

answered Jul 11 '18 at 03:06

Elliott Slaughter

1,455
19
27

score 1 · Answer 4 · answered Aug 19 '13 at 08:19

If you want to use some parallelism, I made a quick version using OpenMP :

template <class T, 
          class InputIterator, 
          class MapFunction, 
          class ReductionFunction>
T MapReduce_n(InputIterator in, 
              unsigned int size, 
              T baseval, 
              MapFunction mapper, 
              ReductionFunction reducer)
{
    T val = baseval;

    #pragma omp parallel
    {
        T map_val = baseval;

        #pragma omp for nowait
        for (auto i = 0U; i < size; ++i)
        {
            map_val = reducer(map_val, mapper(*(in + i)));
        }

        #pragma omp critical
        val = reducer(val, map_val);
    }

    return val;
}

It is fast but there is certainly room for optimisation, especially around for (auto i = 0U; i < size; ++i) I think. (But I couldn't figure how to make an iterator-only version with OpenMP, any help would be appreciated!).

On a quick test with 1000000 elements array, and the computation iterated 1000 times to have a mean value, I made some comparisons.

Version 1 :

for (auto i = 0U; i < size; ++i)
    val += std::pow(in[i][0], 2) + std::pow(in[i][1], 2);

score when compiled with:

g++ : 30 seconds
g++ -O3 : 2.6 seconds

Version 2 :

This version is the most optimized for this computation I think. (It gives the best result).

#pragma omp parallel reduction( + : val )
{
    double map_val = 0.0;

    #pragma omp for
    for (int i=0; i < size; ++i)
    {
        map_val += std::pow(in[i][0], 2) + std::pow(in[i][1], 2);
    }

    val += map_val;
}

g++ -O3 : 0.2 seconds (it's the best one)

Version 3

This version uses the MapReduce_n function template I shown earlier :

double val = MapReduce_n(in, size, 0.0, [] (fftw_complex val)
    {
        return std::pow(val[0], 2.0) + std::pow(val[1], 2.0);
    }, std::plus<double>());

g++ -O3 : 0.4 seconds, so there is a slight overhead for not using directly the OMP reduce directly. However, it doesn't allows custom operators, so at one point you (sadly) have to trade speed for genericity.

Nice! One thing though...is `std::pow(x, 2.0)` really faster than `x*x`?! — Nordlöw, Aug 19 '13 at 22:55
According to this one should be inlinee to the other http://stackoverflow.com/questions/6321170/is-there-any-advantage-to-using-powx-2-instead-of-xx-with-x-double. However I prefer to write `pow` because it looks more like the formula, and because there is a lot of other `pow` in the software I took this source-code from, so the "looks" are consistent. — Jean-Michaël Celerier, Aug 20 '13 at 03:30

score 1 · Answer 5 · 2014-10-12T16:16:40.687

1

I am surprised noone said how to do this with Boost.Range:

accumulate(v | transformed((int(*)(int))&std::abs), 0);

where v is a Singe Pass Range (ie, any STL container). The abs overload has to be specified, otherwise this would be as elegant as Haskell.

edited Oct 12 '14 at 16:16

answered Oct 09 '14 at 21:40

Transform-and-Accumulate

5 Answers5