all -
I have been running in circles with this code. I have a data frame with data for 2018, 2019, 2020, and 2021. Sometimes there are duplicate rows, but since the index is different, pd.drop_duplicates does not work and after troubleshooting for a few hours I decided to just drop all rows that may have a duplicate row when I clean my data set; however, when I run the code below and pull my new clean pandas df, the rows that I deleted in the for loop don't delete from the df.
the 'POS' variable I am finding unique values for is a position identifier.
positions = np.unique(df[['POS']].values).flatten().tolist() #find all unique positions
for position in positions:
index2 = df.index[df['POS'] == position].tolist() #recall index of unique positions
#if then deletes all records and their duplicate
if int(len(index2)) > 4:
for i in index2:
df.drop(i)
Any help or direction is much appreciated! :)