70

How can I remove duplicate characters from a string using Python? For example, let's say I have a string:

foo = 'mppmt'

How can I make the string:

foo = 'mpt'

NOTE: Order is not important

JSW189
  • 6,267
  • 11
  • 44
  • 72

16 Answers16

149

If order does not matter, you can use

"".join(set(foo))

set() will create a set of unique letters in the string, and "".join() will join the letters back to a string in arbitrary order.

If order does matter, you can use a dict instead of a set, which since Python 3.7 preserves the insertion order of the keys. (In the CPython implementation, this is already supported in Python 3.6 as an implementation detail.)

foo = "mppmt"
result = "".join(dict.fromkeys(foo))

resulting in the string "mpt". In earlier versions of Python, you can use collections.OrderedDict, which has been available starting from Python 2.7.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
46

If order does matter, how about:

>>> foo = 'mppmt'
>>> ''.join(sorted(set(foo), key=foo.index))
'mpt'
DSM
  • 342,061
  • 65
  • 592
  • 494
  • 2
    True enough. But it's almost 8 times faster than OrderedDict.fromkeys on a five-character string. ;-) – DSM Mar 23 '12 at 15:04
  • 5
    @DSM: Usually, the speed only matters if the string is long. I have to correct the O(n^2) analysis, though. In Python 2.x, the set can at most have 256 elements, regardless of the length of the string. Taking this into account, it's O(n). It won't get really bad even for very long strings (though it is easy to construct cases where it is 8 times slower than the `OrderedDict` approach). – Sven Marnach Mar 23 '12 at 15:11
  • @Sven Marnach: Hmm, I hadn't even though about character set restrictions. – DSM Mar 23 '12 at 15:43
  • "Order matters" means order must be preserved, not sorted. So 'abzyxaabbx' should return 'abxyx' – Ken Haley Sep 26 '22 at 15:00
13

If order is not the matter:

>>> foo='mppmt'
>>> ''.join(set(foo))
'pmt'

To keep the order:

>>> foo='mppmt'
>>> ''.join([j for i,j in enumerate(foo) if j not in foo[:i]])
'mpt'
kev
  • 155,172
  • 47
  • 273
  • 272
6

Create a list in Python and also a set which doesn't allow any duplicates. Solution1 :

def fix(string):
    s = set()
    list = []
    for ch in string:
        if ch not in s:
            s.add(ch)
            list.append(ch)
    
    return ''.join(list)        

string = "Protiijaayiiii"
print(fix(string))

Method 2 :

s = "Protijayi"

aa = [ ch  for i, ch in enumerate(s) if ch not in s[:i]]
print(''.join(aa))

Method 3 :

dd = ''.join(dict.fromkeys(a))
print(dd)
Soudipta Dutta
  • 1,353
  • 1
  • 12
  • 7
3

As was mentioned "".join(set(foo)) and collections.OrderedDict will do. A added foo = foo.lower() in case the string has upper and lower case characters and you need to remove ALL duplicates no matter if they're upper or lower characters.

from collections import OrderedDict
foo = "EugeneEhGhsnaWW"
foo = foo.lower()
print "".join(OrderedDict.fromkeys(foo))

prints eugnhsaw

3
#Check code and apply in your Program:

#Input= 'pppmm'    
s = 'ppppmm'
s = ''.join(set(s))  
print(s)
#Output: pm
hp_elite
  • 158
  • 1
  • 6
  • 2
    not sure if you noticed it, but you solution does not work for the case OP is asking. – Nik O'Lai Jun 03 '20 at 17:22
  • @NikO'Lai, Thanks for pointing this out. Have changed the code. Earlier code was- pattern=reg.compile(r"(.)\1{1,}",reg.DOTALL) string=pattern.sub(r"\1",s) print(string) – hp_elite Sep 03 '20 at 06:58
2
def dupe(str1):
    s=set(str1)

    return "".join(s)
str1='geeksforgeeks'
a=dupe(str1)
print(a)

works well if order is not important.

ravi tanwar
  • 598
  • 5
  • 16
2
d = {}
s="YOUR_DESIRED_STRING"
res=[]
for c in s:
    if c not in d:
      res.append(c)
      d[c]=1
print ("".join(res))

variable 'c' traverses through String 's' in the for loop and is checked if c is in a set d (which initially has no element) and if c is not in d, c is appended to the character array 'res' then the index c of set d is changed to 1. after the loop is exited i.e c finishes traversing through the string to store unique elements in set d, the resultant res which has all unique characters is printed.

Tarish
  • 468
  • 8
  • 8
2

Using regular expressions:

import re
pattern = r'(.)\1+' # (.) any character repeated (\+) more than
repl = r'\1'        # replace it once
text = 'shhhhh!!!
re.sub(pattern,repl,text)

output:

sh!
IndPythCoder
  • 693
  • 6
  • 10
2

If order is important,

seen = set()
result = []
for c in foo:
    if c not in seen:
        result.append(c)
        seen.add(c)
result = ''.join(result)

Or to do it without sets:

result = []
for c in foo:
    if c not in result:
        result.append(c)
result = ''.join(result)
Kevin Coffey
  • 386
  • 1
  • 6
  • 1
    @Marcin: I don't understand that at all. Won't c always be in set(foo)? – DSM Mar 23 '12 at 14:59
  • @Marcin That will always return an empty string. Every c in foo is in set(foo) – Kevin Coffey Mar 23 '12 at 15:00
  • 1
    @DSM / Kevin. Good thing I didn't post that as an answer. `seen = set(); ''.join(seen.add(c) or c for c in foo if c not in seen)`. It's an implicit-is-better-than-explicit Friday. – Marcin Mar 23 '12 at 15:03
  • 1
    Building a string like this `result += c` is unpythonic as it creates new strings each time. – Steven Rumbalski Mar 23 '12 at 15:15
  • No no no. Do not do `result+=c` with strings. String are not mutable and you will need to create and destroy the string with each character added. – the wolf Mar 23 '12 at 16:23
1

As string is a list of characters, converting it to dictionary will remove all duplicates and will retain the order.

"".join(list(dict.fromkeys(foo)))

hrnjan
  • 373
  • 6
  • 13
1

Functional programming style while keeping order:

import functools

def get_unique_char(a, b):
    if b not in a:
        return a + b
    else:
        return a

if __name__ == '__main__':
    foo = 'mppmt'

    gen = functools.reduce(get_unique_char, foo)
    print(''.join(list(gen)))
Olivier_s_j
  • 5,490
  • 24
  • 80
  • 126
0
def remove_duplicates(value):
    var=""
    for i in value:
        if i in value:
            if i in var:
                pass
            else:
                var=var+i
    return var

print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))
  • 1
    You can always edit your answer instead of commenting your own post. Also, consider adding any explanation to your code. – Pochmurnik Aug 23 '19 at 20:22
0
from collections import OrderedDict
def remove_duplicates(value):
        m=list(OrderedDict.fromkeys(value))
        s=''
        for i in m:
            s+=i
        return s
print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))

  • please study this answer: https://stackoverflow.com/a/48255240/1056268. you will learn how to use join() – Nik O'Lai Jun 03 '20 at 17:26
0
 mylist=["ABA", "CAA", "ADA"]
 results=[]
 for item in mylist:
     buffer=[]
     for char in item:
         if char not in buffer:
             buffer.append(char)
     results.append("".join(buffer))
    
 print(results)

 output
 ABA
 CAA
 ADA
 ['AB', 'CA', 'AD']
Golden Lion
  • 3,840
  • 2
  • 26
  • 35
0

You can replace matches of

rgx = r'(.)(?=.*\1)'

with empty strings.

import re

print(re.sub(rgx, '', 'abbcabdeeeafgfh'))
  #=> "cbdeagfh"

Demo

The regular expression matches any character (.), saves it to capture group 1 ((.)) and requires (by the use of the positive lookahead (?=.*\1)) that the same character (\1) appears later in the string.

In the example, the first and second 'a''s are matched, and therefore converted to empty strings, because in each case there is another 'a' later in the string. The third 'a' in the string is not matched because there are no 'a''s later in the string.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100