5

I would like find the row (index) where the cumulative sum of the values in some column exceeds a threshold.

I can, and do, find this location using a simple loop, like below:

def sum_to(df, col, threshold):
    s = 0
    for r in df.iterrows():
        if s + r[1][col] > threshold:
            return r[0]
        else:
            s += r[1][col]

    return len(df)

However, I would like to know if there is a better/nicer way to achieve this in Pandas.

mibm
  • 1,328
  • 2
  • 13
  • 23
  • 2
    Yes. There is a better way. Can we see your data in text form edited into your question? Hoping that isn't too much to ask, it really makes life easy when answering questions. – cs95 Dec 13 '17 at 16:13

2 Answers2

12

The simplest way is probably

df[col].cumsum().searchsorted(threshold)

but this assumes that you have no negative numbers in your column.

Isaac
  • 3,586
  • 1
  • 18
  • 20
1

So you want something like this:

df = pd.DataFrame({'A': [1, 2, 3, 4, 5]})
df[df['A'].cumsum() > 5]
#  A
#2 3
#3 4
#4 5
zipa
  • 27,316
  • 6
  • 40
  • 58