Removing duplicate rows from a CSV file using a python script and update this CSV file

Question

I have a myfile.csv with rows like

first, second, third
1, 2, 3
a, b, c
1, 2, 3

and so on.

I don't understand how to remove duplicate rows in myfile.csv.

One condition, we can't save new files, we need to update myfile.csv.
In order to after run script myfile.csv look like

first, second, third
a, b, c
1, 2, 3

So new data is not saved to a new file need of updating myfile.csv.
Thank you very much.

How big is the CSV file (number of rows)? – Ilija Feb 18 '18 at 15:46 — Ilija, Feb 18 '18 at 15:46

score 9 · Answer 1 · answered Feb 18 '18 at 15:48

You can loop over the data and filter the lists to contain only unique values:

import csv
with open('filename.csv') as f:
  data = list(csv.reader(f))
  new_data = [a for i, a in enumerate(data) if a not in data[:i]]
  with open('filename.csv', 'w') as t:
     write = csv.writer(t)
     write.writerows(new_data)

score 8 · Accepted Answer · answered Feb 18 '18 at 16:07

Simply and short with pandas module:

import pandas as pd

df = pd.read_csv('myfile.csv')
df.drop_duplicates(inplace=True)
df.to_csv('myfile.csv', index=False)

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

Removing duplicate rows from a CSV file using a python script and update this CSV file

2 Answers2