0

I've read another topics about drop rows based on column dataframe but this is not working (probably I'm doing something wrong).

I want to remove rows based on column value from unique_id using the following code but all rows from df are being removed:

df.drop(df[df["unique_id"].isin(dfb["unique_id"])].index, inplace=True)

When I run a print for each dataframe, I have the following results:

print(len(dfb)) # 74
print(len(df)) # 124

The value of unique_id is a hash based on all columns from dataframe so it's impossible to have same value because I don't have any duplicated row.

What is wrong in my code?

Here is another print about my code:

print(df[df['uniqueid'] == '8c3200304820d46f0708e329a345189b']) 
#[1 rows x 19 columns]
print(dfb[dfb['uniqueid'] == '8c3200304820d46f0708e329a345189b'])
#Empty DataFrame
  • Hard to tell without having some example dataframe. But you can try boolean indexing with the `~` operator: `dfb = df[~df["unique_id"].isin(dfb["unique_id"])]` – Erfan May 05 '19 at 12:20
  • Wow it works! What is the function of symbol `~`? –  May 05 '19 at 12:23
  • You can read it as _NOT_, so in your example, all the id's which are NOT in the dfb – Erfan May 05 '19 at 12:24
  • Possible duplicate of [pandas get rows which are NOT in other dataframe](https://stackoverflow.com/questions/28901683/pandas-get-rows-which-are-not-in-other-dataframe) – cwalvoort May 05 '19 at 12:27

0 Answers0