Removing duplicates (not by using set)

Question

My data look like this:

let = ['a', 'b', 'a', 'c', 'a']

How do I remove the duplicates? I want my output to be something like this:

['b', 'c']

When I use the set function, I get:

set(['a', 'c', 'b'])

This is not what I want.

For what language? (Edit your question and add it to the tags) — Mr. Llama, Dec 21 '14 at 20:23
@michnguyen You will have to clarify a bit more about what you are trying to accomplish, because it does not seem to be merely removing duplicates (if so, then 'a' would be included in the result). — rchang, Dec 21 '14 at 20:29

score 2 · Answer 1 · edited May 23 '17 at 12:28

2

One option would be (as derived from Ritesh Kumar's answer here)

let = ['a', 'b', 'a', 'c', 'a']
onlySingles = [x for x in let if let.count(x) < 2]

which gives

>>> onlySingles
['b', 'c']

edited May 23 '17 at 12:28

Community

1
1

answered Dec 21 '14 at 20:33

embert

7,336
10
49
78

It's fairly heavy running `let.count` each time – Jon Clements Dec 22 '14 at 08:36
eg: this makes linear scans of `let` 25 times over. Either sort/group, or make a linear frequency count *once*, which can reduce the key space, then iterate over that... worse case for later is 2N – Jon Clements Dec 22 '14 at 08:53

score 1 · Answer 2 · answered Dec 22 '14 at 02:07

1

Try this,

>>> let
['a', 'b', 'a', 'c', 'a']
>>> dict.fromkeys(let).keys()
['a', 'c', 'b']
>>>

answered Dec 22 '14 at 02:07

Nishant Nawarkhede

8,234
12
59
81

This is basically using a set in disguise. – 9000 Dec 22 '14 at 02:24

score 0 · Answer 3 · answered Dec 22 '14 at 02:31

Sort the input, then removing duplicates becomes trivial:

data = ['a', 'b', 'a', 'c', 'a']

def uniq(data):
  last = None
  result = []
  for item in data:
    if item != last:
      result.append(item)
      last = item
  return result

print uniq(sorted(data))
# prints ['a', 'b', 'c']

This is basically the shell's cat data | sort | uniq idiom. The cost is O(N * log N), same as with a tree-based set.

Jon Clements · Answer 4 · 2014-12-22T08:46:48.680

Instead of sorting, or linearly scanning and re-counting the main list for its occurrences each time.

Count the number of occurrences and then filter on items that appear once...

>>> from collections import Counter
>>> let = ['a', 'b', 'a', 'c', 'a']
>>> [k for k, v in Counter(let).items() if v == 1]
['c', 'b']

You have to look at the sequence at least once regardless - although it makes sense to limit the amount of times you do so.

If you really want to avoid any type or set or otherwise hashed container (because you perhaps can't use them?), then yes, you can sort it, then use:

>>> from itertools import groupby, islice
>>> [k for k,v in groupby(sorted(let)) if len(list(islice(v, 2))) == 1]
['b', 'c']

Removing duplicates (not by using set)

4 Answers4