I have a CSV file with multiple entries. Example csv:
user, phone, email
joe, 123, joe@x.com
mary, 456, mary@x.com
ed, 123, ed@x.com
I'm trying to remove the duplicates by a specific column in the CSV however with the code below I'm getting an "list index out of range". I thought by comparing row[1]
with newrows[1]
I would find all duplicates and only rewrite the unique entries in file2.csv
. This doesn't work though and I can't understand why.
f1 = csv.reader(open('file1.csv', 'rb'))
newrows = []
for row in f1:
if row[1] not in newrows[1]:
newrows.append(row)
writer = csv.writer(open("file2.csv", "wb"))
writer.writerows(newrows)
My end result is to have a list that maintains the sequence of the file (set
won't work...right?) which should look like this:
user, phone, email
joe, 123, joe@x.com
mary, 456, mary@x.com