delete a part of pd.DataFrame with Python

Question

I'm iterating over rows in my DataFrame with DataFrame.iterrows() and if a row meets certain criteria I store it in the other DataFrame. Is there a way to delete rows that appear in both of them like set.difference(another_set)?

I was asked to provide a code, so, since I dont know the answer to my question, I worked around my problem and created another DataFrame, to which I save good data instead of having two DataFrames and taking a difference of them both.

def test_right_chain(self, temp):
    temp__=pd.DataFrame()
    temp_=pd.DataFrame()
    key=temp["nr right"].iloc[0]
    temp_=temp_.append(temp.iloc[0])
    temp=temp[1:]
    for index, row in temp.iterrows():
        print row
        key_=row['nr right']
        if abs(key_-key)==1:
            pass
        elif len(temp_)>2:
            print row
            temp__.append(temp_)
            temp_=pd.DataFrame()
        else:
            temp_=pd.DataFrame()
        temp_=temp_.append(row)
        key=key_
    return temp__

You should post at least some line of input code and an expected output to let us reproduce your problem and help you. — Fabio Lamanna, Mar 29 '16 at 14:40
It would be much easier to help you if you would provide a sample input data set with 5-7 rows in __text__ form and expected output — MaxU - stand with Ukraine, Mar 29 '16 at 15:56

score 0 · Accepted Answer · edited May 23 '17 at 11:45

You can do an intersection of both DataFrames with df.merge(df1, df2, right_index=True, how='inner') function, leaving indexes that appear by the rows in left DataFrame (I don't know why, but this happens when I use right_index=True) and then retrieve indexes of those rows. (I used answer from this question: Compare Python Pandas DataFrames for matching rows)

df1 = pd.DataFrame(np.random.rand(10,4),columns=list('ABCD'))

df2 = df1.ix[4:8]
df2.reset_index(drop=True,inplace=True)
df2.loc[-1] = [2, 3, 4, 5]
df2.loc[-2] = [14, 15, 16, 17]
df2.reset_index(drop=True,inplace=True)

df3=pd.merge(df1, df2, on=['A', 'B', 'C', 'D'], right_index=True, how='inner')

Now you need indexes of rows that appear in both DataFrames:

indexes= df3.index.values

And then you just need to drop those rows from your DataFrame:

df1=df1.drop(df1.index[indexes])

delete a part of pd.DataFrame with Python

1 Answers1