first I'm a newbie so if there's a simpler way to do this, I'm all ears.
I have some relatively simple code to find duplicates, then remove them. I'm not sure what I'm doing wrong. Basically I create a series from .duplicated
. Then I'm running a for loop against the data frame to remove the duplicates. I know I have dups (193 of them), but nothing is getting removed. I start with 1893 rows and still have 1893 at the end. Here's what I have so far.
#drop the rows, starting w creating a boolean of where dups are
ms_clnd_bool = ms_clnd_study.duplicated()
print(ms_clnd_bool) #look at what I have
x = 0
for row in ms_clnd_bool: #for loop through the duplicates series
if ms_clnd_bool[x] == True:
ms_clnd_study.drop(ms_clnd_study.index[x])
x += 1
ms_clnd_study
Thanks for the help!