-1

I want to ake a generator function that loops over an input iterable sequence, yielding one element at a time, but skipping duplicates. An example code is below:

numbers = [4, 5, 2, 6, 2, 3, 5, 8]
nums = unique(numbers)
    next(nums)
4
    next(nums)
5
    next(nums)
2
    next(nums)
6
    next(nums)
3
    next(nums)
8

Does anyone ave any ideas why this code is not printing?

def unique(iterable):
    seen = set()
    for n in iterable:
        if n not in seen:
            seen.add(n)
            yield n

numbers = [4, 5, 2, 6, 2, 3, 5, 8]
nums = unique(numbers)
print(next(nums))
user10019227
  • 117
  • 1
  • 2
  • 7
  • Indentation in python is really important, both `seen.add()` and `yield` need to be indented to beyond the `if` statement. – AChampion Jul 16 '18 at 00:45
  • 1
    Where did you get this code from? This is almost identical to the `unique_everseen` recipe in the `itertools` docs, which makes me think you got it from there, or from someone who's familiar with that code and rewrote it from memory. So you should be able to compare your code with the code you copied and see the difference in indentation. – abarnert Jul 16 '18 at 00:46
  • I did this and it still did not change the output. Is there anything else wrong with the code which might make it not print? – user10019227 Jul 16 '18 at 00:49
  • @abarnert I got the code from one of the answers below. – user10019227 Jul 16 '18 at 00:50
  • @user10019227: That codes has the indents all wrong (but to a degree that would make it raise a `SyntaxError`). Fix the indents, fix the code. We can't debug code that isn't what you're even running. AChampion's code is correct, your code is indented incorrectly. – ShadowRanger Jul 16 '18 at 00:51
  • I changed my indents but there is still nothing being printed. – user10019227 Jul 16 '18 at 00:56
  • Weird. That code behaves as expected for me. It prints `4`. – PM 2Ring Jul 16 '18 at 00:58
  • Does this generator just need to handle finite iterables, or should it also handle infinite iterables? – PM 2Ring Jul 16 '18 at 01:00
  • Just finite iterables. – user10019227 Jul 16 '18 at 01:02
  • I got the code working; there was a misspelling error from when i was fixing the indents. No idea why it ran the code anyway. Thanks for the help! – user10019227 Jul 16 '18 at 01:04

3 Answers3

4

A simple unique generator would just keep a set of items already seen, e.g.:

def unique(nums):
    seen = set()
    for n in nums:
        if n not in seen:
            seen.add(n)
            yield n

In []:
numbers = [4, 5, 2, 6, 2, 3, 5, 8]
list(unique(numbers))

Out[]:
[4, 5, 2, 6, 3, 8]
AChampion
  • 29,683
  • 4
  • 59
  • 75
  • 3
    Note this algorithm is identical to the `itertools` `unique_everseen` recipe in the [docs](https://docs.python.org/3/library/itertools.html#itertools-recipes), and also available in a 3rd party library as `toolz.unique`. – jpp Jul 16 '18 at 00:43
  • Just for reference: https://docs.python.org/3.6/library/itertools.html#itertools-recipes – AChampion Jul 16 '18 at 00:47
  • This is how I would've done it if the OP conditioned order matters! – pstatix Jul 16 '18 at 00:54
3

Simplest way is to use OrderedDict, an easy way to dedupe while preserving order:

from collections import OrderedDict

def unique(nums):
    yield from OrderedDict.fromkeys(nums)

Technically, it operates eagerly (all deduping is done up front, then you iterate the completely deduped OrderedDict), but all other solutions would need to build an equivalent set anyway by the end, so this delays the production of the first value, but does the same amount of work overall (and on Python versions with a C implemented OrderedDict, runs faster than handrolling a generator using a set as a "seen" store). The cases for which it is unsuitable are infinite input iterables and finite but large iterables where it is likely you'll stop processing them longer before you finish (in which case a lazier, unique_everseen solution based on a set is needed).

On Python 3.6 and higher, plain dict preserves order (though it's not an official guarantee until 3.7), so you don't even need an import:

def unique(nums):
    yield from dict.fromkeys(nums)
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Do you think there's any performance benefit to defining an ordered set (like the solutions [here](https://stackoverflow.com/questions/1653970/does-python-have-an-ordered-set)), or would this be *less* efficient? It seems all these workarounds stem from the non-existence of OrderedSet. – jpp Jul 16 '18 at 00:57
  • @jpp: Most of the third party `OrderedSet`s I've seen are implemented at the Python layer, so they'd be slower (though possibly slightly more memory efficient). The 3.6+ solution (using plain `dict`) is actually more memory efficient than a *plain* `set` based solution; the new `dict` design actually makes equivalent "add only" `dict` use less memory than an equivalent `set`, and while lookups and insertions are a little slower, they're typically within an order of magnitude; no Python implemented function will match `dict` on this. – ShadowRanger Jul 16 '18 at 02:07
  • Got it, thank you. Just to add, I read somewhere that a more-or-less ready C-level OrderedSet was written by Python developers, but the use cases were not deemed sufficient to expand `collections`. Probably because the solution you have outlined isn't too much work. – jpp Jul 16 '18 at 02:11
0

Is there a reason you need a generator? Why not just use a set?

numbers = [4, 5, 2, 6, 2, 3, 5, 8]
for i in set(numbers):
    print(i)

If you really need a generator:

def skipper(l):
    for i in set(l):
        yield i

for i in skipper(numbers):
    print(i)
pstatix
  • 3,611
  • 4
  • 18
  • 40
  • 3
    This doesn't guarantee that you get the numbers in the same order as the original list, which might be important to the OP. – AChampion Jul 16 '18 at 00:34
  • Also, side-note: `for i in set(l): yield i` can be simplified on modern Python (since 3.3) to just `yield from set(l)`; doesn't fix the ordering issue, but it's faster/simpler to use `yield from` than a manual loop + `yield`. – ShadowRanger Jul 16 '18 at 00:47
  • @AChampion Thought about it, but since omitted from the OP's question, didn't account for it. Must supply desired conditions! – pstatix Jul 16 '18 at 00:53