I can't find how to remove duplicate rows based on column 2. I looked at the documentation for the csv module but nothing I could see to implement.
My current output for list-history.csv:
Number,Keywords
5,banana
8,apple
Number,Keywords
5,banana
Number,Keywords
5,banana
8,apple
Desired output:
Number,Keywords
5,banana
8,apple
And appending for new entries to the desired ouput.
I tried another way but this is the closest I found which doesn't mention column 2. I don't really know what to do from this point.
with open("list-history.csv", "r") as f:
lines = f.readlines()
with open("list-history.csv", "a", encoding="utf8") as f:
reader = csv.reader(f)
header = next(reader)
for line in reader:
if line.strip("\n") == "Number,Keywords":
f.write(line)
But this code doesn't remove other duplicates within the whole column 2. I just want to keep the header once and no duplicates beyond. My constraint is to keep data to come in from file1 to file2, the latter being the one about the code above.
=== SOLVED ISSUE =======
import fileinput
seen = set() # set for fast O(1) amortized lookup
for line in fileinput.FileInput('1.csv', inplace=1):
if line in seen: continue # skip duplicate
seen.add(line)
print(line, end='')
Removing duplicate rows from a csv file using a python script