In pandas, how to find the row/index where the cumulative sum is greater than a threshold?

Question

I would like find the row (index) where the cumulative sum of the values in some column exceeds a threshold.

I can, and do, find this location using a simple loop, like below:

def sum_to(df, col, threshold):
    s = 0
    for r in df.iterrows():
        if s + r[1][col] > threshold:
            return r[0]
        else:
            s += r[1][col]

    return len(df)

However, I would like to know if there is a better/nicer way to achieve this in Pandas.

Yes. There is a better way. Can we see your data in text form edited into your question? Hoping that isn't too much to ask, it really makes life easy when answering questions. — cs95, Dec 13 '17 at 16:13

score 12 · Accepted Answer · answered Dec 13 '17 at 16:17

12

The simplest way is probably

df[col].cumsum().searchsorted(threshold)

but this assumes that you have no negative numbers in your column.

answered Dec 13 '17 at 16:17

Isaac

3,586
1
18
20

No, I don't have negative values. This will work great. Thanks! – mibm Dec 13 '17 at 17:05

score 1 · Answer 2 · answered Dec 13 '17 at 16:17

1

So you want something like this:

df = pd.DataFrame({'A': [1, 2, 3, 4, 5]})
df[df['A'].cumsum() > 5]
#  A
#2 3
#3 4
#4 5

answered Dec 13 '17 at 16:17

zipa

27,316
6
40
58

In pandas, how to find the row/index where the cumulative sum is greater than a threshold?

2 Answers2

Linked