Remove duplicate lines from a string in python

Question

I have a string in python, and would like to remove duplicate lines (i.e. when the text between \n is the same, then remove the second (third, fourth) occurrence, but preserve the order of the string. for example

line1 \n line2 \n line3 \n line2 \n line2 \n line 4

would return:

line1 \n line2 \n line3 \n line 4

Other examples i have seen on stackoverflow deal with at the stage of reading the text file into python (e.g. using readline(), seeing if already in a set of read in lines, and then adding to string only if it is unique). in my instance this doesn't work, as the string I have has already been heavily manipulated since loading into python... and it seems very botched to e.g. write the whole string to a txt file, and then read in line-by-line looking for duplicated lines

what have you tried? I dont think it is that hard.. For starters just split this string and in another loop, start adding the unique values. — karthikr, Feb 14 '15 at 17:44

score 12 · Accepted Answer · answered Feb 14 '15 at 17:55

For Python 2.7+, this can be done with a one-liner:

from collections import OrderedDict

test_string = "line1 \n line2 \n line3 \n line2 \n line2 \n line 4"

"\n".join(list(OrderedDict.fromkeys(test_string.split("\n"))))

This gives me: 'line1 \n line2 \n line3 \n line 4'

score 2 · Answer 2 · answered Feb 14 '15 at 17:45

>>> lines = "line1 \n line2 \n line3 \n line2 \n line2 \n line 4"
>>> seen = set()
>>> answer = []
>>> for line in lines.splitlines():
...     if line not in seen:
...             seen.add(line)
...             answer.append(line)
... 
>>> print '\n'.join(answer)
line1 
 line2 
 line3 
 line 4
>>> '\n'.join(answer)
'line1 \n line2 \n line3 \n line 4'

Remove duplicate lines from a string in python

2 Answers2

Linked