Removing duplicate characters from a string

Question

How can I remove duplicate characters from a string using Python? For example, let's say I have a string:

foo = 'mppmt'

How can I make the string:

foo = 'mpt'

NOTE: Order is not important

Ahem... http://stackoverflow.com/questions/636977/best-way-to-remove-duplicate-characters-words-in-a-string — nullpotent, Mar 23 '12 at 14:51
@AljoshaBre - use the 'close' button and select `close as dupe` and supply that link. Thank you — Martin Beckett, Mar 23 '12 at 14:54
@AljoshaBre None of those answers are guaranteed to maintain order. — Marcin, Mar 23 '12 at 14:54

Sven Marnach · Accepted Answer · 2020-03-31T09:03:00.687

149

If order does not matter, you can use

"".join(set(foo))

set() will create a set of unique letters in the string, and "".join() will join the letters back to a string in arbitrary order.

If order does matter, you can use a dict instead of a set, which since Python 3.7 preserves the insertion order of the keys. (In the CPython implementation, this is already supported in Python 3.6 as an implementation detail.)

foo = "mppmt"
result = "".join(dict.fromkeys(foo))

resulting in the string "mpt". In earlier versions of Python, you can use collections.OrderedDict, which has been available starting from Python 2.7.

edited Mar 31 '20 at 09:03

answered Mar 23 '12 at 14:51

Sven Marnach

574,206
118
941
841

2

+1: `fromkeys()` is not used very often, but you put it to excellent use here. – Eric O. Lebigot Mar 23 '12 at 18:03
print "".join(OrderedDict.fromkeys(foo)) ^ SyntaxError: invalid syntax – flik Feb 21 '19 at 06:25
@flik Yeah, as noted, the code above is for Python version 2.7. – Sven Marnach Feb 21 '19 at 08:44

DSM · Answer 2 · 2012-03-23T18:13:57.143

46

If order does matter, how about:

>>> foo = 'mppmt'
>>> ''.join(sorted(set(foo), key=foo.index))
'mpt'

edited Mar 23 '12 at 18:13

answered Mar 23 '12 at 14:56

DSM

342,061
65
592
494

2

True enough. But it's almost 8 times faster than OrderedDict.fromkeys on a five-character string. ;-) – DSM Mar 23 '12 at 15:04
5

@DSM: Usually, the speed only matters if the string is long. I have to correct the O(n^2) analysis, though. In Python 2.x, the set can at most have 256 elements, regardless of the length of the string. Taking this into account, it's O(n). It won't get really bad even for very long strings (though it is easy to construct cases where it is 8 times slower than the `OrderedDict` approach). – Sven Marnach Mar 23 '12 at 15:11
@Sven Marnach: Hmm, I hadn't even though about character set restrictions. – DSM Mar 23 '12 at 15:43
"Order matters" means order must be preserved, not sorted. So 'abzyxaabbx' should return 'abxyx' – Ken Haley Sep 26 '22 at 15:00

score 13 · Answer 3 · answered Mar 23 '12 at 14:52

13

If order is not the matter:

>>> foo='mppmt'
>>> ''.join(set(foo))
'pmt'

To keep the order:

>>> foo='mppmt'
>>> ''.join([j for i,j in enumerate(foo) if j not in foo[:i]])
'mpt'

answered Mar 23 '12 at 14:52

kev

155,172
47
273
272

Soudipta Dutta · Answer 4 · 2022-08-07T12:58:10.760

Create a list in Python and also a set which doesn't allow any duplicates. Solution1 :

def fix(string):
    s = set()
    list = []
    for ch in string:
        if ch not in s:
            s.add(ch)
            list.append(ch)
    
    return ''.join(list)        

string = "Protiijaayiiii"
print(fix(string))

Method 2 :

s = "Protijayi"

aa = [ ch  for i, ch in enumerate(s) if ch not in s[:i]]
print(''.join(aa))

Method 3 :

dd = ''.join(dict.fromkeys(a))
print(dd)

score 3 · Answer 5 · answered Jan 14 '18 at 23:33

As was mentioned "".join(set(foo)) and collections.OrderedDict will do. A added foo = foo.lower() in case the string has upper and lower case characters and you need to remove ALL duplicates no matter if they're upper or lower characters.

from collections import OrderedDict
foo = "EugeneEhGhsnaWW"
foo = foo.lower()
print "".join(OrderedDict.fromkeys(foo))

prints eugnhsaw

hp_elite · Answer 6 · 2020-09-03T06:57:14.667

3

#Check code and apply in your Program:

#Input= 'pppmm'

s = 'ppppmm'
s = ''.join(set(s))  
print(s)
#Output: pm

edited Sep 03 '20 at 06:57

answered Jun 09 '18 at 21:32

hp_elite

158
1
6

2

not sure if you noticed it, but you solution does not work for the case OP is asking. – Nik O'Lai Jun 03 '20 at 17:22
@NikO'Lai, Thanks for pointing this out. Have changed the code. Earlier code was- pattern=reg.compile(r"(.)\1{1,}",reg.DOTALL) string=pattern.sub(r"\1",s) print(string) – hp_elite Sep 03 '20 at 06:58

score 2 · Answer 7 · answered Jul 03 '18 at 08:35

2

def dupe(str1):
    s=set(str1)

    return "".join(s)
str1='geeksforgeeks'
a=dupe(str1)
print(a)

works well if order is not important.

answered Jul 03 '18 at 08:35

ravi tanwar

598
5
16

Tarish · Answer 8 · 2021-06-10T07:34:00.187

2

d = {}
s="YOUR_DESIRED_STRING"
res=[]
for c in s:
    if c not in d:
      res.append(c)
      d[c]=1
print ("".join(res))

variable 'c' traverses through String 's' in the for loop and is checked if c is in a set d (which initially has no element) and if c is not in d, c is appended to the character array 'res' then the index c of set d is changed to 1. after the loop is exited i.e c finishes traversing through the string to store unique elements in set d, the resultant res which has all unique characters is printed.

edited Jun 10 '21 at 07:34

answered Dec 29 '18 at 17:28

Tarish

468
8
8

2

Consider including a description of your code to help others understand it. – Henry Woody Dec 29 '18 at 21:55

score 2 · Answer 9 · answered Aug 24 '21 at 09:08

2

Using regular expressions:

import re
pattern = r'(.)\1+' # (.) any character repeated (\+) more than
repl = r'\1'        # replace it once
text = 'shhhhh!!!
re.sub(pattern,repl,text)

output:

sh!

answered Aug 24 '21 at 09:08

IndPythCoder

693
6
10

Kevin Coffey · Answer 10 · 2012-03-23T17:43:30.083

2

If order is important,

seen = set()
result = []
for c in foo:
    if c not in seen:
        result.append(c)
        seen.add(c)
result = ''.join(result)

Or to do it without sets:

result = []
for c in foo:
    if c not in result:
        result.append(c)
result = ''.join(result)

edited Mar 23 '12 at 17:43

answered Mar 23 '12 at 14:54

Kevin Coffey

386
1
6

1

@Marcin: I don't understand that at all. Won't c always be in set(foo)? – DSM Mar 23 '12 at 14:59
@Marcin That will always return an empty string. Every c in foo is in set(foo) – Kevin Coffey Mar 23 '12 at 15:00
1

@DSM / Kevin. Good thing I didn't post that as an answer. `seen = set(); ''.join(seen.add(c) or c for c in foo if c not in seen)`. It's an implicit-is-better-than-explicit Friday. – Marcin Mar 23 '12 at 15:03
1

Building a string like this `result += c` is unpythonic as it creates new strings each time. – Steven Rumbalski Mar 23 '12 at 15:15
No no no. Do not do `result+=c` with strings. String are not mutable and you will need to create and destroy the string with each character added. – the wolf Mar 23 '12 at 16:23

score 1 · Answer 11 · answered Aug 30 '19 at 10:39

1

As string is a list of characters, converting it to dictionary will remove all duplicates and will retain the order.

"".join(list(dict.fromkeys(foo)))

answered Aug 30 '19 at 10:39

hrnjan

373
6
13

this is only true in python3.6+ – anthony sottile Jan 07 '20 at 02:56

score 1 · Answer 12 · answered Jun 26 '21 at 06:36

Functional programming style while keeping order:

import functools

def get_unique_char(a, b):
    if b not in a:
        return a + b
    else:
        return a

if __name__ == '__main__':
    foo = 'mppmt'

    gen = functools.reduce(get_unique_char, foo)
    print(''.join(list(gen)))

score 0 · Answer 13 · answered Aug 23 '19 at 20:02

0

def remove_duplicates(value):
    var=""
    for i in value:
        if i in value:
            if i in var:
                pass
            else:
                var=var+i
    return var

print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))

answered Aug 23 '19 at 20:02

Abhisek Meshram

1
1

1

You can always edit your answer instead of commenting your own post. Also, consider adding any explanation to your code. – Pochmurnik Aug 23 '19 at 20:22

score 0 · Answer 14 · answered Oct 31 '19 at 12:50

0

from collections import OrderedDict
def remove_duplicates(value):
        m=list(OrderedDict.fromkeys(value))
        s=''
        for i in m:
            s+=i
        return s
print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))

answered Oct 31 '19 at 12:50

swamy_teja7

1

please study this answer: https://stackoverflow.com/a/48255240/1056268. you will learn how to use join() – Nik O'Lai Jun 03 '20 at 17:26

score 0 · Answer 15 · answered Jan 31 '21 at 02:39

 mylist=["ABA", "CAA", "ADA"]
 results=[]
 for item in mylist:
     buffer=[]
     for char in item:
         if char not in buffer:
             buffer.append(char)
     results.append("".join(buffer))
    
 print(results)

 output
 ABA
 CAA
 ADA
 ['AB', 'CA', 'AD']

score 0 · Answer 16 · answered Apr 01 '23 at 23:42

You can replace matches of

rgx = r'(.)(?=.*\1)'

with empty strings.

import re

print(re.sub(rgx, '', 'abbcabdeeeafgfh'))
  #=> "cbdeagfh"

Demo

The regular expression matches any character (.), saves it to capture group 1 ((.)) and requires (by the use of the positive lookahead (?=.*\1)) that the same character (\1) appears later in the string.

In the example, the first and second 'a''s are matched, and therefore converted to empty strings, because in each case there is another 'a' later in the string. The third 'a' in the string is not matched because there are no 'a''s later in the string.

Removing duplicate characters from a string

16 Answers16

Linked

Related