4

I have a myfile.csv with rows like

first, second, third
1, 2, 3
a, b, c
1, 2, 3

and so on.

I don't understand how to remove duplicate rows in myfile.csv.

One condition, we can't save new files, we need to update myfile.csv.
In order to after run script myfile.csv look like

first, second, third
a, b, c
1, 2, 3

So new data is not saved to a new file need of updating myfile.csv.
Thank you very much.

Serhii
  • 1,367
  • 3
  • 13
  • 31

2 Answers2

9

You can loop over the data and filter the lists to contain only unique values:

import csv
with open('filename.csv') as f:
  data = list(csv.reader(f))
  new_data = [a for i, a in enumerate(data) if a not in data[:i]]
  with open('filename.csv', 'w') as t:
     write = csv.writer(t)
     write.writerows(new_data)
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
8

Simply and short with pandas module:

import pandas as pd

df = pd.read_csv('myfile.csv')
df.drop_duplicates(inplace=True)
df.to_csv('myfile.csv', index=False)

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105