12

How can I remove duplicate characters from a string using Python? For example, let's say I have a string:

foo = "SSYYNNOOPPSSIISS"

How can I make the string:

foo = SYNOPSIS

I'm new to python and What I have tired and it's working. I knew there is smart and best way to do this.. and only experience can show this..

def RemoveDupliChar(Word):
        NewWord = " "
        index = 0
        for char in Word:
                if char != NewWord[index]:
                        NewWord += char
                        index += 1
        print(NewWord.strip()) 

NOTE: Order is important and this question is not similar to this one.

Community
  • 1
  • 1
Rahul Patil
  • 1,014
  • 3
  • 14
  • 30

7 Answers7

20

Using itertools.groupby:

>>> foo = "SSYYNNOOPPSSIISS"
>>> import itertools
>>> ''.join(ch for ch, _ in itertools.groupby(foo))
'SYNOPSIS'
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • 1
    Is it possible to change grp to _? – Roman Pekar Sep 14 '13 at 06:36
  • @RahulPatil, `_` (`grp` before the answer modification) is iterable which yield individual items (characters here) that is grouped together. – falsetru Sep 14 '13 at 06:40
  • I spent some time to create that function, I don't aware of `itertools.groupby` how you found that ? – Rahul Patil Sep 14 '13 at 06:40
  • 2
    @RahulPatil Is it commonly used in loops as a placeholder name. You never use it, but it is put there because you need to put something. `itertools.groupby` is part of the itertools module in the standard library. There is a link in falsetru's answer – TerryA Sep 14 '13 at 06:41
  • @RahulPatil, I see [Python Module Index](http://docs.python.org/2/py-modindex.html) to find useful modules in standard library. – falsetru Sep 14 '13 at 06:42
  • thanks to all, really great advantage here after posting small code – Rahul Patil Sep 14 '13 at 06:43
  • This works as shown if you want the result to be 'SYNOPSIS'. But what if you want the result to be 'SYNOPI', where no character is repeated more than once. And what if you want lets say 'jill' from 'jillll' as 'jill' is the correct spelling. – rabin utam May 30 '14 at 03:00
  • @rabinutam, Using `collections.OrderedDict`: `from collections import OrderedDict; print(''.join(OrderedDict.fromkeys("SSYYNNOOPPSSIISS")))` – falsetru May 30 '14 at 13:19
  • @falsetru hi, how to use it in pandas column? – sygneto Jun 24 '19 at 11:29
  • @sygneto, please post a separated question. – falsetru Jun 24 '19 at 12:20
  • @falsetru https://stackoverflow.com/questions/56736595/remove-duplicated-letters-from-pandas-column-exist-only-to-each-other-python – sygneto Jun 24 '19 at 12:27
4

This is a solution without importing itertools:

foo = "SSYYNNOOPPSSIISS"
''.join([foo[i] for i in range(len(foo)-1) if foo[i+1]!= foo[i]]+[foo[-1]])

Out[1]: 'SYNOPSIS'

But it is slower than the others method!

G M
  • 20,759
  • 10
  • 81
  • 84
3

How about this:

oldstring = 'SSSYYYNNNOOOOOPPPSSSIIISSS'
newstring = oldstring[0]
for char in oldstring[1:]:
    if char != newstring[-1]:
        newstring += char    
Elliott
  • 1,331
  • 12
  • 12
1
def remove_duplicates(astring):
  if isinstance(astring,str) :
    #the first approach will be to use set so we will convert string to set and then convert back set to string and compare the lenght of the 2
    newstring = astring[0]
    for char in astring[1:]:
        if char not in newstring:
            newstring += char    
    return newstring,len(astring)-len(newstring)
  else:
raise TypeError("only deal with alpha  strings")

I've discover that solution with itertools and with list comprehesion even the solution when we compare the char to the last element of the list doesn't works

Espoir Murhabazi
  • 5,973
  • 5
  • 42
  • 73
0
def removeDuplicate(s):  
    if (len(s)) < 2:
        return s

    result = []
    for i in s:
        if i not in result:
            result.append(i)

    return ''.join(result)  
rrao
  • 297
  • 4
  • 11
jtu
  • 19
  • 1
0

How about

foo = "SSYYNNOOPPSSIISS"


def rm_dup(input_str):
    newstring = foo[0]
    for i in xrange(len(input_str)):
        if newstring[(len(newstring) - 1 )] != input_str[i]:
            newstring += input_str[i]
        else:
            pass
    return newstring

print rm_dup(foo)
rrao
  • 297
  • 4
  • 11
-1

You can try this:

string1 = "example1122334455"
string2 = "hello there"

def duplicate(string):
    temp = ''

    for i in string:
        if i not in temp: 
            temp += i

    return temp;

print(duplicate(string1))
print(duplicate(string2))
Dino
  • 7,779
  • 12
  • 46
  • 85