I have an excel file with a minimum of 600,00 lines (the size varies). I want to get all duplicates of a particular column with Pandas.
This is what I have tried so far:
use_cols = ['ID', 'AMOUNT']
df = pd.DataFrame()
for chunk in pd.read_csv("INPUT.csv", usecols=use_cols, chunksize=10000):
df = pd.concat([df, chunk])
duplicates = df[df.duplicated(["ID"])]
print(duplicates)
However, the results I get are not duplicates and I'm not sure what I might be doing wrong. Is there a more efficient way to go about this?