0

all -

I have been running in circles with this code. I have a data frame with data for 2018, 2019, 2020, and 2021. Sometimes there are duplicate rows, but since the index is different, pd.drop_duplicates does not work and after troubleshooting for a few hours I decided to just drop all rows that may have a duplicate row when I clean my data set; however, when I run the code below and pull my new clean pandas df, the rows that I deleted in the for loop don't delete from the df.

the 'POS' variable I am finding unique values for is a position identifier.

positions = np.unique(df[['POS']].values).flatten().tolist() #find all unique positions

for position in positions:
    index2 = df.index[df['POS'] == position].tolist() #recall index of unique positions
    
    #if then deletes all records and their duplicate
    if int(len(index2)) > 4:
        for i in index2:
            df.drop(i)

Any help or direction is much appreciated! :)

  • 1
    drop dupes should work you're probably not using it correctly., - index does not matter. try `df.drop_duplicates(subset=[group of columns that contain the dupes], keep='first')` also don't use loops in pandas its an anti pattern – Umar.H Nov 19 '21 at 04:03

1 Answers1

-2

You could use the inplace parameter in the drop method if you want your changes to be reflected in the same dataframe. Source

df.drop(i, inplace = True)
  • [don't use inplace](https://stackoverflow.com/questions/45570984/in-pandas-is-inplace-true-considered-harmful-or-not) – Umar.H Nov 19 '21 at 04:02
  • the answer needs more clarification and details. – Neeraj Nov 19 '21 at 07:12
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Neeraj Nov 19 '21 at 07:12