1

sometimes I have a string like this

string = "Hett, Agva,"

and sometimes I will have duplicates in it.

string = "Hett, Agva, Delf, Agva, Hett,"

how can I check if my string has duplicates and then if it does remove them?

UPDATE.

So in the second string i need to remove Agva, and Hett, because there is 2x of them in the string

Chaban33
  • 1,362
  • 11
  • 38
  • 1
    is `','` also a duplicate? How do you define duplicates? – Ma0 Aug 29 '18 at 09:44
  • if there is 2x Agva, i need one to be removed – Chaban33 Aug 29 '18 at 09:46
  • Do you need to maintain order after removing duplicates? – jpp Aug 29 '18 at 09:48
  • so the `'Hett'` that appears twice does not bother you.. You have to work on your definition a bit. If it is just `'Agva'` you might as well rewrite the string. – Ma0 Aug 29 '18 at 09:49
  • The OP wants **all** duplicates to be removed then be it `Hett` or `Agva` or `blah` – Sheldore Aug 29 '18 at 09:52
  • Possible duplicate of [How can I remove duplicate words in a string with Python?](https://stackoverflow.com/questions/7794208/how-can-i-remove-duplicate-words-in-a-string-with-python) – Ankur Sinha Aug 29 '18 at 10:09

5 Answers5

2

Iterate over the parts (words) and add each part to a set of seen parts and to a list of parts if it is not already in that set. Finally. reconstruct the string:

seen = set()
parts = []
for part in string.split(','):
    if part.strip() not in seen:
        seen.add(part.strip())
        parts.append(part)

no_dups = ','.join(parts)

(note that I had to add some calls to .strip() as there are spaces at the start of some of the words which this method removes)

which gives:

'Hett, Agva, Delf,'

Why use a set?

To query whether an element is in a set, it is O(1) average case - since they are stored by a hash which makes lookup constant time. On the other hand, lookup in a list is O(n) as Python must iterate over the list until the element is found. This means that it is much more efficient for this task to use a set since, for each new word, you can instantly check to see if you have seen in before whereas you'd have to iterate over a list of seen elements otherwise which would take much longer for a large list.


Oh and to just check if there are duplicates, query whether the length of the split list is the same as the set of that list (which removes the duplicates but looses the order).

I.e.

def has_dups(string):
    parts = string.split(',')
    return len(parts) != len(set(parts))

which works as expected:

>>> has_dups('Hett, Agva,')
False
>>> has_dups('Hett, Agva, Delf, Agva, Hett,')
True
Joe Iddon
  • 20,101
  • 7
  • 33
  • 54
1

If order of words id important then you can make a list of words in the string and then iterate over the list to make a new list of unique words.

string = "Hett, Agva, Delf, Agva, Hett,"
words_list = string.split()

unique_words = []
[unique_words.append(w) for w in words_list if w not in unique_words]
new_string = ' '.join(unique_words)
print (new_String)

Output:

'Hett, Agva, Delf,'
haccks
  • 104,019
  • 25
  • 176
  • 264
1

You can use toolz.unique, or equivalently the unique_everseen recipe in the itertools docs, or equivalently @JoeIddon's explicit solution.

Here's the solution using 3rd party toolz:

x = "Hett, Agva, Delf, Agva, Hett,"

from toolz import unique

res = ', '.join(filter(None, unique(x.replace(' ', '').split(','))))

print(res)

'Hett, Agva, Delf'

I've removed whitespace and used filter to clean up a trailing , which may not be required.

jpp
  • 159,742
  • 34
  • 281
  • 339
1

if you will receive a string in only this format then you can do the following:

import numpy as np

string_words=string.split(',')
uniq_words=np.unique(string_words)

string=""
for word in uniq_words:
    string+=word+", "
string=string[:-1]

what this code does is that it splits words into a list, finds unique items, and then merges them into a string like before

Imtinan Azhar
  • 1,725
  • 10
  • 26
0

Quick and easy approach:

', '.join(
         set(
             filter( None, [ i.strip() for i in string.split(',') ] )
         )
     )

Hope it helps. Please feel free to ask if anything is not clear :)

Nimeshka Srimal
  • 8,012
  • 5
  • 42
  • 57