Iterate over the parts (words) and add each part to a set of seen parts and to a list of parts if it is not already in that set. Finally. reconstruct the string:
seen = set()
parts = []
for part in string.split(','):
if part.strip() not in seen:
seen.add(part.strip())
parts.append(part)
no_dups = ','.join(parts)
(note that I had to add some calls to .strip()
as there are spaces at the start of some of the words which this method removes)
which gives:
'Hett, Agva, Delf,'
Why use a set?
To query whether an element is in
a set, it is O(1)
average case - since they are stored by a hash which makes lookup constant time. On the other hand, lookup in a list is O(n)
as Python must iterate over the list until the element is found. This means that it is much more efficient for this task to use a set
since, for each new word, you can instantly check to see if you have seen in before whereas you'd have to iterate over a list
of seen elements otherwise which would take much longer for a large list.
Oh and to just check if there are duplicates, query whether the length of the split list is the same as the set of that list (which removes the duplicates but looses the order).
I.e.
def has_dups(string):
parts = string.split(',')
return len(parts) != len(set(parts))
which works as expected:
>>> has_dups('Hett, Agva,')
False
>>> has_dups('Hett, Agva, Delf, Agva, Hett,')
True