How to perform in-place removal of duplicates from a string in Python?

Question

I am trying to implement an inplace algorithm to remove duplicates from a string in Python.

str1 = "geeksforgeeks"
for i in range(len(str1)):
    for j in range(i+1,len(str1)-1):
         if str1[i] == str1[j]:  //Error Line
                      str1 = str1[0:j]+""+str1[j+1:]



print str1

In the above code, I am trying to replace the duplicate character with whitespace. But I get IndexError: string index out of range at if str1[i] == str1[j]. Am I missing out on something or is it not the right way?

My expected output is: geksfor

`i` takes on the value of all valid indices of `str1`. Then `j` is `i+1`. When `i` is the highest valid index, `j` is thus out of range. — TigerhawkT3, Mar 03 '19 at 05:08
@TigerhawkT3 when i is 10, j is 10 and length of string is 10, j — Animeartist, Mar 03 '19 at 05:12
You're altering the length of the string while going through it, therefore the range changes. — Jab, Mar 03 '19 at 05:12
You cannot perform in-place modification of a string in Python. Strings are immutable. — Adam Smith, Mar 03 '19 at 05:13
@Jab I am trying to do inplace, hence I can't use another array. — Animeartist, Mar 03 '19 at 05:14
@Animeartistfromhell7 then the contest is nonsensical. You can't modify strings in-place in Python. Full stop. — Adam Smith, Mar 03 '19 at 05:15
@Animeartistfromhell7 I have no experience in C, so I couldn't tell you, but you *cannot* do this in Python. It breaks the language specification (strings are immutable) — Adam Smith, Mar 03 '19 at 05:18
@Animeartistfromhell7 You can use a [bytearray](https://docs.python.org/3/library/stdtypes.html#bytearray), which is mutable. — Keith, Mar 03 '19 at 05:29
"when i is 10, j is 10" Why would that be the case when the inner loop for which `j` is the loop variable uses a `range` object that starts with `i+1`? — TigerhawkT3, Mar 03 '19 at 06:09
@Keith only if he only uses ascii strings. Python strings are unicode since Python3. — Adam Smith, Mar 03 '19 at 06:24

Jab · Answer 1 · 2019-03-03T05:37:39.220

1

You can do all of this with just a set and a comprehension. No need to complicate things.

str1 = "geeksforgeeks"

seen = set()
seen_add = seen.add
print(''.join(s for s in str1 if not (s in seen or seen_add(s))))
#geksfor

"Simple is better than complex."

^{~ See PEP20}

Edit

While the above is more simple than your answer, it is the most performant way of removing duplicates from a collection the more simple solution would be to use:

from collections import OrderedDict
print("".join(OrderedDict.fromkeys(str1)))

edited Mar 03 '19 at 05:37

answered Mar 03 '19 at 05:18

Jab

26,853
21
75
114

3

That's an interesting (non-obvious) use of the `or` conditional. Which is to say "Thanks! I hate it!" – Adam Smith Mar 03 '19 at 05:22
I'm sorry? I'm not sure I follow. – Jab Mar 03 '19 at 05:27
There are simplest solutions: https://stackoverflow.com/questions/9841303/removing-duplicate-characters-from-a-string – Alderven Mar 03 '19 at 05:29
Well, while keeping with simplicity and performance I say [this answer](https://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-whilst-preserving-order) is what I prefer. And I'm saying simplicity in comparison to OP's solution. – Jab Mar 03 '19 at 05:32
@AdamSmith - It's lifted from [another answer](https://stackoverflow.com/a/480227/2617068), the citation for which was eventually added in the [fourth revision](https://stackoverflow.com/revisions/54965808/4). – TigerhawkT3 Mar 03 '19 at 06:13

score 0 · Answer 2 · answered Mar 03 '19 at 05:18

It is impossible to modify strings in-place in Python, the same way that it's impossible to modify numbers in-place in Python.

a = "something"
b = 3

b += 1        # allocates a new integer, 4, and assigns it to b
a += " else"  # allocates a new string, " else", concatenates it to `a` to produce "something else"
              # then assigns it to a

score 0 · Answer 3 · answered Mar 03 '19 at 08:54

As already pointed str is immutable, so in-place requirement make no sense. If you want to get desired output I would do it following way:

str1 = 'geeksforgeeks'
out = ''.join([i for inx,i in enumerate(str1) if str1.index(i)==inx])
print(out) #prints: geksfor

Here I used enumerate function to get numerated (inx) letters and fact that .index method of str, returns lowest possible index of element therefore str1.index('e') for given string is 1, not 2, not 9 and not 10.

score 0 · Answer 4 · answered Mar 03 '19 at 18:48

Here is a simplified version of unique_everseen from itertools recipes.

from itertools import filterfalse

def unique_everseen(iterable)
    seen = set()
    see _ add = seen.add
    for element in filterfalse(seen.__contains__, iterable):
        seen_add(element)
        yield element

You can then use this generator with str.join to get the expected output.

str1 = "geeksforgeeks"
new_str1 = ''.join(unique_everseen(str1)) # 'geksfor'

How to perform in-place removal of duplicates from a string in Python?

4 Answers4