0

I have a CSV file that looks like this

red,75,right
red,344,right
green,3,center
yellow,3222,right
blue,9,center
black,123,left
white,68,right
green,47,left
purple,48,left
purple,988,right
pink,2677,left
white,34,right

I am using Python and am trying to remove rows that have duplicate in cell 1. I know I can achieve this using something like pandas but I am trying to do it using standard python CSV library.

Expected Result is...

red,75,right
green,3,center
yellow,3222,right
blue,9,center
black,123,left
white,68,right
purple,988,right
pink,2677,left

Anyone have an example?

fightstarr20
  • 11,682
  • 40
  • 154
  • 278

2 Answers2

2

You can simply use a dictionary where the color is the key and the value is the row. Ignore the color if it is already in the dictionary, otherwise add it and write the row to a new csv file.

import csv

file_in = 'input_file.csv'
file_out = 'output_file.csv'
with open(file_in, 'rb') as fin, open(file_out, 'wb') as fout:
    reader = csv.reader(fin)
    writer = csv.writer(fout)
    d = {}
    for row in reader:
        color = row[0]
        if color not in d:
            d[color] = row  
            writer.writerow(row)
result = d.values()

result
# Output:
# [['blue', '9', 'center'],
# ['pink', '2677', 'left'],
# ['purple', '48', 'left'],
# ['yellow', '3222', 'right'],
# ['black', '123', 'left'],
# ['green', '3', 'center'],
# ['white', '68', 'right'],
# ['red', '75', 'right']]

And the output of the csv file:

!cat output_file.csv
# Output:
# red,75,right
# green,3,center
# yellow,3222,right
# blue,9,center
# black,123,left
# white,68,right
# purple,48,left
# pink,2677,left
Alexander
  • 105,104
  • 32
  • 201
  • 196
0

You can try this :

import fileinput

def main():
    seen = set() # set for fast O(1) amortized lookup

    for line in fileinput.FileInput('1.csv', inplace=1):
        cell_1 = line.split(',')[0]
        if cell_1 not in seen: 
            seen.add(cell_1)
            print line, # standard output is now redirected to the file

if __name__ == '__main__':
    main()
Ahmed Abdelkafi
  • 467
  • 2
  • 8