Remove duplicate values in csv files

Question

I have a csv file wit values like

68,68
70,70
80,90

Here i would like it to remove the duplicates i.e. give the output

68
70
80,90

Or

 68,
 70,
 80,90

But i tried searching everywhere and was not able to find how to do this

score 0 · Accepted Answer · answered May 15 '21 at 09:37

Depending on the size of your input, a naive approach could be fine:

$ cat test 
68,68
70,70
80,90
$ cat readvals.py 
#! /usr/bin/env python
import csv
vals = [] # a list for the entire file
with open('test') as infile:
    lines = csv.reader(infile,delimiter=',')
    for i, line in enumerate(lines):
        vals.append([]) # append a sub-list for this row.
        for val in line:
            if val not in vals[i]:
                vals[i].append(val) # add values for the row
print(vals)
$ python readvals.py
[['68'], ['70'], ['80', '90']]

score 0 · Answer 2 · edited May 15 '21 at 09:40

0

For removing duplicate rows I use this code.

import pandas as pd

df = pd.read_csv('myfile.csv')

df.drop_duplicates(inplace=True)

df.to_csv('myfile.csv', index=False)

edited May 15 '21 at 09:40

sushanth

8,275
3
17
28

answered May 15 '21 at 09:38

Rafia Manzoor

1

I need it, thank you Rafia Manzoor and Sushanth – tursunWali Oct 10 '21 at 05:54

score 0 · Answer 3 · answered May 15 '21 at 10:09

i would suggest you to have a look at these below since its not clear about your requirement.

pandas documentation:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Example ref: https://www.journaldev.com/33488/pandas-drop-duplicate-rows-drop_duplicates-function

Remove duplicate values in csv files

3 Answers3