1

I am new in python and I have a text file with the following content

ss ss1, ss ss2, ss ss3, ss ss2, ss ss2, ss ss3

I want to delete the duplicate Words. My expected output is following

ss ss1, ss ss2, ss ss3

I am using the below code

f = open('a.txt', 'r')
file_contents = f.read()
words = file_contents.split()
SS=",".join(sorted(set(words), key=words.index))
print SS

My current output is

ss,ss1,,ss2,,ss3,,ss2

Kevin
  • 215
  • 2
  • 8

1 Answers1

2

If you don't need the order of the list, you can try this:

>>> f="ss ss1, ss ss2, ss ss3, ss ss2, ss ss2, ss ss3"
>>> list(set( f.split(', ')))
['ss ss2', 'ss ss3', 'ss ss1']

Constructing set from list may take a lot time, another way to keep the order:

>>> f="ss ss1, ss ss2, ss ss3, ss ss2, ss ss2, ss ss3"
>>> result=[]
>>> for i in f.split(', '):
...     if i not in result:
...         result.append(i)
...
>>> result
['ss ss1', 'ss ss2', 'ss ss3']

By the way, if the list is very large, and to use set to check if new item already exist is a more efficient way.

>>> result=[]
>>> s=set()
>>> for i in f.split(', '):
...     if i not in s:
...         result.append(i)
...         s.add(i)
...
>>> result
['ss ss1', 'ss ss2', 'ss ss3']
McGrady
  • 10,869
  • 13
  • 47
  • 69
  • This is a good solution. Also, If the number of elements is too high, you could add an additional set so the check for existence of elements. That way `if i not in result` is `o(log n)` in instance of `o(n)` – VMRuiz May 11 '17 at 07:00
  • @McGrady thanks for your help. If I want retain the the new lines then how I change your code – Kevin May 16 '17 at 04:51
  • @Kevin Maybe you want [this](http://stackoverflow.com/questions/3277503/how-do-i-read-a-file-line-by-line-into-a-list) – McGrady May 16 '17 at 05:00