Dropping duplicate rows in data frame with pandas df.drop(), not df.drop_duplicates

Question

all -

I have been running in circles with this code. I have a data frame with data for 2018, 2019, 2020, and 2021. Sometimes there are duplicate rows, but since the index is different, pd.drop_duplicates does not work and after troubleshooting for a few hours I decided to just drop all rows that may have a duplicate row when I clean my data set; however, when I run the code below and pull my new clean pandas df, the rows that I deleted in the for loop don't delete from the df.

the 'POS' variable I am finding unique values for is a position identifier.

positions = np.unique(df[['POS']].values).flatten().tolist() #find all unique positions

for position in positions:
    index2 = df.index[df['POS'] == position].tolist() #recall index of unique positions
    
    #if then deletes all records and their duplicate
    if int(len(index2)) > 4:
        for i in index2:
            df.drop(i)

Any help or direction is much appreciated! :)

drop dupes should work you're probably not using it correctly., - index does not matter. try `df.drop_duplicates(subset=[group of columns that contain the dupes], keep='first')` also don't use loops in pandas its an anti pattern — Umar.H, Nov 19 '21 at 04:03

Augustine Theodore · Answer 1 · 2021-11-23T02:31:24.937

-2

You could use the inplace parameter in the drop method if you want your changes to be reflected in the same dataframe. Source

df.drop(i, inplace = True)

edited Nov 23 '21 at 02:31

answered Nov 19 '21 at 03:07

Augustine Theodore

45
2
6

[don't use inplace](https://stackoverflow.com/questions/45570984/in-pandas-is-inplace-true-considered-harmful-or-not) – Umar.H Nov 19 '21 at 04:02
the answer needs more clarification and details. – Neeraj Nov 19 '21 at 07:12
Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Neeraj Nov 19 '21 at 07:12

Dropping duplicate rows in data frame with pandas df.drop(), not df.drop_duplicates

1 Answers1