Pandas remove row from dataframe dynamically

Question

I have a pandas dataframe as

name    dist
a       10
b       11
c       5
d       3

I want to iterate through the dataframe and for each row, I want to check a condition using column dist on all other rows, and if it falls below a threshold, I need to delete that row. This threshold itself is computed as a function of the dist values of other rows. How can I efficiently use iterrows() so as to drop the rows without iterating through all the rows in a nested loop?

Here is how I am currently doing:

ind_to_drop=[]
for idx1, row1 in df.iterrows():
    for idx2, row2 in df.iterrows():
        if idx1!=idx2:
            val = myfunc(row1.dist, row2.dist) #This is the function to compute that value
            if val>0:
                ind_to_drop.append(idx2) #here we want to drop the row with index idx2

Instead of appending the indices in ind_to_drop can we remove the row dynamically so that the number of iterations can be reduced?

What is the condition? Please be more specific and mention the details that are safe and secure to share. — TheFaultInOurStars, Mar 17 '22 at 18:08
Please check if this helps https://stackoverflow.com/questions/13851535/how-to-delete-rows-from-a-pandas-dataframe-based-on-a-conditional-expression — Manjunath K Mayya, Mar 17 '22 at 18:11
@ManjunathKMayya If I drop the row a subsequent iteration will give missing index. — S_S, Mar 18 '22 at 03:40
For performance you can use https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html. — keramat, Mar 18 '22 at 06:17
So you want to drop a row r, if there exists some other row r2, such that `myfunc(r2.dist, r.dist) > 0`? — Riley, Mar 18 '22 at 12:30

score 0 · Answer 1 · answered Mar 17 '22 at 18:23

0

Use:

ndf = pd.DataFrame()
for i, row in df.iterrows():
    other_rows_func = df.iloc[[x for x in range(len(df)) if x != i]]['dist'].mean()
    if row['dist']>other_rows_func:
        ndf = ndf.append(row)

Here, I check if the dist of the row is lower than the mean of all other rows. You can use your desired function. Output:

answered Mar 17 '22 at 18:23

keramat

4,328
6
25
38

1

I ain't gonna downvote, but how did you know what condition the OP asked for? – TheFaultInOurStars Mar 17 '22 at 18:28
vectorisation would be much more efficient in this solution that `iterrows` – Riley Mar 17 '22 at 23:36
@AmirhosseinKiani I updated the question, for this case you can consider the function to return a value and we need to drop the row when it exceeds a threshold, for eg 0 as shown in the example. – S_S Mar 18 '22 at 03:48
Can I do this in the same dataframe instead of creating `ndf`? – S_S Mar 18 '22 at 03:50

Pandas remove row from dataframe dynamically

1 Answers1