65

I want to append characters to a string, but want to make sure all the letters in the final list are unique.

Example: "aaabcabccd""abcd"

Now of course I have two solutions in my mind. One is using a list that will map the characters with their ASCII codes. So whenever I encounter a letter it will set the index to True. Afterwards I will scan the list and append all the ones that were set. It will have a time complexity of O(n).

Another solution would be using a dict and following the same procedure. After mapping every char, I will do the operation for each key in the dictionary. This will have a linear running time as well.

Since I am a Python newbie, I was wondering which would be more space efficient. Which one could be implemented more efficiently?

PS: Order is not important while creating the list.

martineau
  • 119,623
  • 25
  • 170
  • 301
Ali
  • 5,338
  • 12
  • 55
  • 78

9 Answers9

132

The simplest solution is probably:

In [10]: ''.join(set('aaabcabccd'))
Out[10]: 'acbd'

Note that this doesn't guarantee the order in which the letters appear in the output, even though the example might suggest otherwise.

You refer to the output as a "list". If a list is what you really want, replace ''.join with list:

In [1]: list(set('aaabcabccd'))
Out[1]: ['a', 'c', 'b', 'd']

As far as performance goes, worrying about it at this stage sounds like premature optimization.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Thanks a lot for the answer. I was just wondering how come this method is more efficient than the dictionary or list? – Ali Dec 16 '12 at 15:39
  • @Ali It's time complexity is the same as the `dict` method (it's the same implementation), but you save on creating key-value pairs. – Marcin Dec 16 '12 at 15:41
  • 2
    @Ali: I didn't say it was more efficient (although it almost certainly is). My point is that you should focus on clarity and correctness first, and only optimize when everything is working well and you have profiled your code and know *what* to optimize. – NPE Dec 16 '12 at 15:42
  • @NPE I was also wondering. This solution contains calling a join over a string I guess (''.join). After this will I have to like iterate through the string and add the chars into my list, or I can apply this join to a list as well? – Ali Dec 16 '12 at 15:43
  • @Ali: If you want a list, just use `list(set('aaabcabccd'))`. Depending on your requirements, it might even make sense to keep the `set` and use it directly. – NPE Dec 16 '12 at 15:44
  • 1
    @All to get order in which they appear `sorted(set('aaabcabccd'), key=('aaabcabccd').index)` – Santhosh Dhaipule Chandrakanth Sep 24 '18 at 03:49
30

Use an OrderedDict. This will ensure that the order is preserved

>>> ''.join(OrderedDict.fromkeys( "aaabcabccd").keys())
'abcd'

PS: I just timed both the OrderedDict and Set solution, and the later is faster. If order does not matter, set should be the natural solution, if Order Matter;s this is how you should do.

>>> from timeit import Timer
>>> t1 = Timer(stmt=stmt1, setup="from __main__ import data, OrderedDict")
>>> t2 = Timer(stmt=stmt2, setup="from __main__ import data")
>>> t1.timeit(number=1000)
1.2893918431815337
>>> t2.timeit(number=1000)
0.0632140599081196
Abhijit
  • 62,056
  • 18
  • 131
  • 204
  • Thanks a lot. I forget to mention that order does not matter. – Ali Dec 16 '12 at 15:38
  • 1
    I'm pretty sure that with preserving order the complexity will be o(nlogn) and not o(n) like the set solutions – Aviram Segal Dec 16 '12 at 15:38
  • 1
    @AviramSegal: Why is that? Surely, preserving the order just requires a linked list running through the dict elements in insertion order? – NPE Dec 16 '12 at 15:43
  • @AviramSegal `OrderedDict` uses `dict` internally and has hence O(1) expected lookup/membership check. Maintaining the order requires additional space (a lot), but only amortized constant additional (appending to a list *if* the key wasn't in the dictionary before). –  Dec 16 '12 at 15:43
  • @NPE Yeah, but note that Python doesn't use linked lists (`list` is a dynamic over-allocating array). –  Dec 16 '12 at 15:44
  • 1
    @delnan: Erm, if you look at the source for `OrderedDict`, you'll see that a linked list is exactly what it uses. – NPE Dec 16 '12 at 15:47
  • @NPE I stand corrected, `OrderedDict` rolls its own doubly-linked list. I assumed it was just a `list` and you made the newbie mistake of taking `list` to mean "linked list". –  Dec 16 '12 at 15:51
7
char_seen = []
for char in string:
    if char not in char_seen:
        char_seen.append(char)
print(''.join(char_seen))

This will preserve the order in which alphabets are coming,

output will be

abcd
Amit Gupta
  • 2,698
  • 4
  • 24
  • 37
  • For preserving the order this seems to be working, the other method which include typecasting to set, destroys the order. – Ashish Pratap Apr 02 '20 at 19:57
6

For completeness sake, here's another recipe that sorts the letters as a byproduct of the way it works:

>>> from itertools import groupby
>>> ''.join(k for k, g in groupby(sorted("aaabcabccd")))
'abcd'
martineau
  • 119,623
  • 25
  • 170
  • 301
3

if the result does not need to be order-preserving, then you can simply use a set

>>> ''.join(set( "aaabcabccd"))
'acbd'
>>>
gefei
  • 18,922
  • 9
  • 50
  • 67
3

Store Unique characters in list

Method 1:

uniue_char = list(set('aaabcabccd'))
#['a', 'b', 'c', 'd']

Method 2: By Loop ( Complex )

uniue_char = []
for c in 'aaabcabccd':
    if not c in uniue_char:
        uniue_char.append(c)
print(uniue_char)
#['a', 'b', 'c', 'd']
dipenparmar12
  • 3,042
  • 1
  • 29
  • 39
2

I have an idea. Why not use the ascii_lowercase constant?

For example, running the following code:

# string module contains the constant ascii_lowercase which is all the lowercase
# letters of the English alphabet
import string
# Example value of s, a string
s = 'aaabcabccd'
# Result variable to store the resulting string
result = ''
# Goes through each letter in the alphabet and checks how many times it appears.
# If a letter appears at least once, then it is added to the result variable
for letter in string.ascii_letters:
    if s.count(letter) >= 1:
        result+=letter

# Optional three lines to convert result variable to a list for sorting
# and then back to a string
result = list(result)
result.sort()
result = ''.join(result)

print(result)

Will print 'abcd'

There you go, all duplicates removed and optionally sorted

Brent Pappas
  • 392
  • 3
  • 11
0

Here we can use dictionary to solve this problem. Set structure is good way if you don't consider the order. but if you care about the order. Try dictionary:

s='BANANA'
single={}
for i in range(len(s)):
    single[s[i]]=i
print(''.join(single.keys()))
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 18 '22 at 04:59
0

To preserve the order we can sort using the index value of the original string

s = 'aaabcabccd'
print(''.join(sorted(set(s), key=s.index)))

Output will be

'abcd'