-1

I have a very simple loop that just takes too long to iterate over my big dataframe.

value = df.at[n,'column_A']

for y in range(0,len(df)):
    index=df[column_B.ge(value_needed)].index[y]
    if index_high  > n:
        break

With this, I'm trying to find the first index that has a value greater than value_needed. The problem is that this loop is just too inneficent to run when len(df)>200000

Any ideas on how to solve this issue?

2 Answers2

1

In general you should try to avoid loops with pandas, here is a vectorized way to get what you want:

df.loc[(df['column_B'].ge(value_needed)) & (df.index > n)].index[0]
fmarm
  • 4,209
  • 1
  • 17
  • 29
  • unfortunately, this code still doesn't do what I need. I can't just that the first value that suits my criteria (index[0]). I need to increase the values inside the brackets to find the next value after `n`. That's why I was using a loop – Nycolas Mancini Apr 14 '20 at 23:40
  • I have added a `df.index > n` condition inside `loc`, this should work now – fmarm Apr 14 '20 at 23:48
  • That was it, chief! Thanks a lot for the simple solution. – Nycolas Mancini Apr 15 '20 at 12:38
1

I wish you have sample data. Try this on your data and let me know what you get

import numpy as np
index = np.where(df[column_B] > value_needed)[0].flat[0]

Then

#continue with other logic
pi_pascal
  • 202
  • 2
  • 8