-1

My data look like this:

let = ['a', 'b', 'a', 'c', 'a']

How do I remove the duplicates? I want my output to be something like this:

['b', 'c']

When I use the set function, I get:

set(['a', 'c', 'b'])

This is not what I want.

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • For what language? (Edit your question and add it to the tags) – Mr. Llama Dec 21 '14 at 20:23
  • @michnguyen You will have to clarify a bit more about what you are trying to accomplish, because it does not seem to be merely removing duplicates (if so, then 'a' would be included in the result). – rchang Dec 21 '14 at 20:29

4 Answers4

2

One option would be (as derived from Ritesh Kumar's answer here)

let = ['a', 'b', 'a', 'c', 'a']
onlySingles = [x for x in let if let.count(x) < 2]

which gives

>>> onlySingles
['b', 'c']
Community
  • 1
  • 1
embert
  • 7,336
  • 10
  • 49
  • 78
  • It's fairly heavy running `let.count` each time – Jon Clements Dec 22 '14 at 08:36
  • eg: this makes linear scans of `let` 25 times over. Either sort/group, or make a linear frequency count *once*, which can reduce the key space, then iterate over that... worse case for later is 2N – Jon Clements Dec 22 '14 at 08:53
1

Try this,

>>> let
['a', 'b', 'a', 'c', 'a']
>>> dict.fromkeys(let).keys()
['a', 'c', 'b']
>>> 
Nishant Nawarkhede
  • 8,234
  • 12
  • 59
  • 81
0

Sort the input, then removing duplicates becomes trivial:

data = ['a', 'b', 'a', 'c', 'a']

def uniq(data):
  last = None
  result = []
  for item in data:
    if item != last:
      result.append(item)
      last = item
  return result

print uniq(sorted(data))
# prints ['a', 'b', 'c']

This is basically the shell's cat data | sort | uniq idiom. The cost is O(N * log N), same as with a tree-based set.

9000
  • 39,899
  • 9
  • 66
  • 104
0

Instead of sorting, or linearly scanning and re-counting the main list for its occurrences each time.

Count the number of occurrences and then filter on items that appear once...

>>> from collections import Counter
>>> let = ['a', 'b', 'a', 'c', 'a']
>>> [k for k, v in Counter(let).items() if v == 1]
['c', 'b']

You have to look at the sequence at least once regardless - although it makes sense to limit the amount of times you do so.

If you really want to avoid any type or set or otherwise hashed container (because you perhaps can't use them?), then yes, you can sort it, then use:

>>> from itertools import groupby, islice
>>> [k for k,v in groupby(sorted(let)) if len(list(islice(v, 2))) == 1]
['b', 'c']
Jon Clements
  • 138,671
  • 33
  • 247
  • 280