Pandas 'reduce' and 'accumulate' functions - incomplete implementation

Question

I would like to use reduce and accumulate functions in Pandas in a way similar to how they apply in native python with lists. In itertools and functools implementations, reduce and accumulate (sometimes called fold and cumulative fold in other languages) require a function with two arguments. In Pandas, there is no similar implementation. The function takes two parameters: f(accumulated_value,popped_value)

So, I have a list of binary variables and want to calculate the number of duration when we are in the 1 state:

In [1]: from itertools import accumulate
        import pandas as pd
        drawdown_periods = [0,1,1,1,0,0,0,1,1,1,1,0,1,1,0]

applying accumulate to this with the lambda function

lambda x,y: (x+y)*y

gives

In [2]: list(accumulate(drawdown_periods, lambda x,y: (x+y)*y))
Out[2]: [0, 1, 2, 3, 0, 0, 0, 1, 2, 3, 4, 0, 1, 2, 0]

counting the length of each drawdown_period.

Is there is a smart but quirky way to supply a lambda function with two arguments? I may be missing a trick here.

I know that there is a lovely recipe with groupby (see StackOverflow How to calculate consecutive Equal Values in Pandas/How to emulate itertools.groupby with a series/dataframe). I'll repeat it since it's so lovely:

In [3]: df = pd.DataFrame(data=drawdown_periods, columns=['dd'])
       df['dd'].groupby((df['dd'] != df['dd'].shift()).cumsum()).cumsum()
Out[3]:
    0     0
    1     1
    2     2
    3     3
    4     0
    5     0
    6     0
    7     1
    8     2
    9     3
    10    4
    11    0
    12    1
    13    2
    14    0
    Name: dd, dtype: int64

This is not the solution I want. I need a way of passing a two-parameter lambda function, to a pandas-native reduce/accumulate functions, since this will also work for many other functional programming recipes.

df.apply(sum) will of course sum on the column (i.e., reduce) df.apply(lambda x: 2*x) will just apply the lambda function to each element (i.e., it's a map). There is no means for passing df.apply(lambda x,y, (x+y)*y, reduce=True) — NBF, May 30 '18 at 11:53

score 4 · Answer 1 · answered May 30 '18 at 12:13

You could get this to work with an efficiency penalty using numpy. In practice, you may be better writing ad hoc vectorized solutions.

Using np.frompyfunc:

s = pd.Series([0,1,1,1,0,0,0,1,1,1,1,0,1,1,0])
f = numpy.frompyfunc(lambda x, y: (x+y) * y, 2, 1)
f.accumulate(series.astype(object))

0     0
1     1
2     2
3     3
4     0
5     0
6     0
7     1
8     2
9     3
10    4
11    0
12    1
13    2
14    0
dtype: object

score 1 · Answer 2 · answered May 30 '18 at 12:14

What you are looking for would be a pandas method that would extract all objects from a Series, convert them to Python object, call a Python function and have an accumulator that is also a Python object.

This kind of behavior does not scale well when you have a lot of data, as there is a lot of time/memory overhead in wrapping the raw data in Python objects. Pandas methods try to work directly on the underlying (numpy) raw data, being able to process lots of data without having to wrap them in Python objects. The groupby+cumsum example you give is a clever way of avoiding the use of .apply and Python functions, which would be slower.

Nevertheless, you are of course free to do your own functional thing in Python if you don't care about the performance. As it's all Python anyway and there's no way of speeding it up on the pandas side, you can just write your own:

df["cev"] = list(accumulate(df.dd, lambda x,y:(x+y)*y))

This does not work if `df.dd` is contains non-binary data – Connor Feb 08 '21 at 19:21 — Connor, Feb 08 '21 at 19:21

dmvianna · Answer 3 · 2019-05-21T06:08:07.143

-1

Use pandas.DataFrame.aggregate and functools.reduce:

import pandas as pd
import operator
from functools import reduce

def reduce_or(series):
    return reduce(operator.or_, series)


df = pd.DataFrame([1,0,0,0], index='a b a b'.split()).astype(bool)
df

df.groupby(df.index).aggregate(reduce_or)

edited May 21 '19 at 06:08

answered May 21 '19 at 05:45

dmvianna

15,088
18
77
106

1

This doesn't appear to answer the question. The question asks for a way to cumulatively fold, where this simply folds the data. – gobernador Dec 12 '19 at 23:36

Pandas 'reduce' and 'accumulate' functions - incomplete implementation

3 Answers3