Delete duplicated words separated by comma

Question

I am new in python and I have a text file with the following content

ss ss1, ss ss2, ss ss3, ss ss2, ss ss2, ss ss3

I want to delete the duplicate Words. My expected output is following

ss ss1, ss ss2, ss ss3

I am using the below code

f = open('a.txt', 'r')
file_contents = f.read()
words = file_contents.split()
SS=",".join(sorted(set(words), key=words.index))
print SS

My current output is

ss,ss1,,ss2,,ss3,,ss2

McGrady · Accepted Answer · 2017-05-11T07:10:07.623

2

If you don't need the order of the list, you can try this:

>>> f="ss ss1, ss ss2, ss ss3, ss ss2, ss ss2, ss ss3"
>>> list(set( f.split(', ')))
['ss ss2', 'ss ss3', 'ss ss1']

Constructing set from list may take a lot time, another way to keep the order:

>>> f="ss ss1, ss ss2, ss ss3, ss ss2, ss ss2, ss ss3"
>>> result=[]
>>> for i in f.split(', '):
...     if i not in result:
...         result.append(i)
...
>>> result
['ss ss1', 'ss ss2', 'ss ss3']

By the way, if the list is very large, and to use set to check if new item already exist is a more efficient way.

>>> result=[]
>>> s=set()
>>> for i in f.split(', '):
...     if i not in s:
...         result.append(i)
...         s.add(i)
...
>>> result
['ss ss1', 'ss ss2', 'ss ss3']

edited May 11 '17 at 07:10

answered May 11 '17 at 06:39

McGrady

10,869
13
47
69

This is a good solution. Also, If the number of elements is too high, you could add an additional set so the check for existence of elements. That way `if i not in result` is `o(log n)` in instance of `o(n)` – VMRuiz May 11 '17 at 07:00
@McGrady thanks for your help. If I want retain the the new lines then how I change your code – Kevin May 16 '17 at 04:51
@Kevin Maybe you want [this](http://stackoverflow.com/questions/3277503/how-do-i-read-a-file-line-by-line-into-a-list) – McGrady May 16 '17 at 05:00

Delete duplicated words separated by comma

1 Answers1