How can I remove duplicate characters from a string using Python? For example, let's say I have a string:
foo = 'mppmt'
How can I make the string:
foo = 'mpt'
NOTE: Order is not important
How can I remove duplicate characters from a string using Python? For example, let's say I have a string:
foo = 'mppmt'
How can I make the string:
foo = 'mpt'
NOTE: Order is not important
If order does not matter, you can use
"".join(set(foo))
set()
will create a set of unique letters in the string, and "".join()
will join the letters back to a string in arbitrary order.
If order does matter, you can use a dict
instead of a set, which since Python 3.7 preserves the insertion order of the keys. (In the CPython implementation, this is already supported in Python 3.6 as an implementation detail.)
foo = "mppmt"
result = "".join(dict.fromkeys(foo))
resulting in the string "mpt"
. In earlier versions of Python, you can use collections.OrderedDict
, which has been available starting from Python 2.7.
If order does matter, how about:
>>> foo = 'mppmt'
>>> ''.join(sorted(set(foo), key=foo.index))
'mpt'
If order is not the matter:
>>> foo='mppmt'
>>> ''.join(set(foo))
'pmt'
To keep the order:
>>> foo='mppmt'
>>> ''.join([j for i,j in enumerate(foo) if j not in foo[:i]])
'mpt'
Create a list in Python and also a set which doesn't allow any duplicates. Solution1 :
def fix(string):
s = set()
list = []
for ch in string:
if ch not in s:
s.add(ch)
list.append(ch)
return ''.join(list)
string = "Protiijaayiiii"
print(fix(string))
Method 2 :
s = "Protijayi"
aa = [ ch for i, ch in enumerate(s) if ch not in s[:i]]
print(''.join(aa))
Method 3 :
dd = ''.join(dict.fromkeys(a))
print(dd)
As was mentioned "".join(set(foo)) and collections.OrderedDict will do. A added foo = foo.lower() in case the string has upper and lower case characters and you need to remove ALL duplicates no matter if they're upper or lower characters.
from collections import OrderedDict
foo = "EugeneEhGhsnaWW"
foo = foo.lower()
print "".join(OrderedDict.fromkeys(foo))
prints eugnhsaw
#Check code and apply in your Program: #Input= 'pppmm'
s = 'ppppmm'
s = ''.join(set(s))
print(s)
#Output: pm
def dupe(str1):
s=set(str1)
return "".join(s)
str1='geeksforgeeks'
a=dupe(str1)
print(a)
works well if order is not important.
d = {}
s="YOUR_DESIRED_STRING"
res=[]
for c in s:
if c not in d:
res.append(c)
d[c]=1
print ("".join(res))
variable 'c' traverses through String 's' in the for loop and is checked if c is in a set d (which initially has no element) and if c is not in d, c is appended to the character array 'res' then the index c of set d is changed to 1. after the loop is exited i.e c finishes traversing through the string to store unique elements in set d, the resultant res which has all unique characters is printed.
Using regular expressions:
import re
pattern = r'(.)\1+' # (.) any character repeated (\+) more than
repl = r'\1' # replace it once
text = 'shhhhh!!!
re.sub(pattern,repl,text)
output:
sh!
If order is important,
seen = set()
result = []
for c in foo:
if c not in seen:
result.append(c)
seen.add(c)
result = ''.join(result)
Or to do it without sets:
result = []
for c in foo:
if c not in result:
result.append(c)
result = ''.join(result)
As string is a list of characters, converting it to dictionary will remove all duplicates and will retain the order.
"".join(list(dict.fromkeys(foo)))
Functional programming style while keeping order:
import functools
def get_unique_char(a, b):
if b not in a:
return a + b
else:
return a
if __name__ == '__main__':
foo = 'mppmt'
gen = functools.reduce(get_unique_char, foo)
print(''.join(list(gen)))
def remove_duplicates(value):
var=""
for i in value:
if i in value:
if i in var:
pass
else:
var=var+i
return var
print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))
from collections import OrderedDict
def remove_duplicates(value):
m=list(OrderedDict.fromkeys(value))
s=''
for i in m:
s+=i
return s
print(remove_duplicates("11223445566666ababzzz@@@123#*#*"))
mylist=["ABA", "CAA", "ADA"]
results=[]
for item in mylist:
buffer=[]
for char in item:
if char not in buffer:
buffer.append(char)
results.append("".join(buffer))
print(results)
output
ABA
CAA
ADA
['AB', 'CA', 'AD']
You can replace matches of
rgx = r'(.)(?=.*\1)'
with empty strings.
import re
print(re.sub(rgx, '', 'abbcabdeeeafgfh'))
#=> "cbdeagfh"
The regular expression matches any character (.
), saves it to capture group 1 ((.)
) and requires (by the use of the positive lookahead (?=.*\1)
) that the same character (\1
) appears later in the string.
In the example, the first and second 'a'
's are matched, and therefore converted to empty strings, because in each case there is another 'a'
later in the string. The third 'a'
in the string is not matched because there are no 'a'
's later in the string.