I need to find all the duplicates in one column of a csv file, and then export these to a different csv file. I've tried answers from this:How do I get a list of all the duplicate items using pandas in python? but am not getting the correct result. Example of my csv file:
filename,ID,status
71.wav,107e,accepted
85.wav,9a99,accepted
85.wav,d27a,accepted
86.wav,ea4f,accepted
86.wav,9f9b,accepted
75.wav,b734,accepted
75.wav,3dfb,accepted
I would like an output of:
85.wav,9a99,accepted
86.wav,ea4f,accepted
75.wav,b734,accepted
I tried:
ids = df["filename"]
dups = df[ids.isin(ids[ids.duplicated()])].sort_values("filename")
print dups
The output of this gave unique values as well as duplicate values.
My expected output would be a csv file with the first duplicate listed as shown above (I edited the question to clarify).