0

I have a pandas dataframe as

name    dist
a       10
b       11
c       5
d       3

I want to iterate through the dataframe and for each row, I want to check a condition using column dist on all other rows, and if it falls below a threshold, I need to delete that row. This threshold itself is computed as a function of the dist values of other rows. How can I efficiently use iterrows() so as to drop the rows without iterating through all the rows in a nested loop?

Here is how I am currently doing:

ind_to_drop=[]
for idx1, row1 in df.iterrows():
    for idx2, row2 in df.iterrows():
        if idx1!=idx2:
            val = myfunc(row1.dist, row2.dist) #This is the function to compute that value
            if val>0:
                ind_to_drop.append(idx2) #here we want to drop the row with index idx2

Instead of appending the indices in ind_to_drop can we remove the row dynamically so that the number of iterations can be reduced?

S_S
  • 1,276
  • 4
  • 24
  • 47

1 Answers1

0

Use:

ndf = pd.DataFrame()
for i, row in df.iterrows():
    other_rows_func = df.iloc[[x for x in range(len(df)) if x != i]]['dist'].mean()
    if row['dist']>other_rows_func:
        ndf = ndf.append(row)

Here, I check if the dist of the row is lower than the mean of all other rows. You can use your desired function. Output:

enter image description here

keramat
  • 4,328
  • 6
  • 25
  • 38
  • 1
    I ain't gonna downvote, but how did you know what condition the OP asked for? – TheFaultInOurStars Mar 17 '22 at 18:28
  • vectorisation would be much more efficient in this solution that `iterrows` – Riley Mar 17 '22 at 23:36
  • @AmirhosseinKiani I updated the question, for this case you can consider the function to return a value and we need to drop the row when it exceeds a threshold, for eg 0 as shown in the example. – S_S Mar 18 '22 at 03:48
  • Can I do this in the same dataframe instead of creating `ndf`? – S_S Mar 18 '22 at 03:50