9

I want a code that deletes all instances of any number that has been repeated from a list.

E.g.:

Inputlist = [2, 3, 6, 6, 8, 9, 12, 12, 14]
 
Outputlist = [2,3,8,9,14]

I have tried to remove the duplicated elements in the list already (by using the "unique" function), but it leaves a single instance of the element in the list nevertheless!

seen = set()
uniq = []
for x in Outputlist:
    if x not in seen:
        uniq.append(x)
        seen.add(x)      
seen

I went through a lot of StackOverflow articles too, but all of them differ in the idea that they are searching for removing common elements from two different lists, or that they want just one instance of each element to still be kept. I want to simply remove all common elements.

martineau
  • 119,623
  • 25
  • 170
  • 301

6 Answers6

10

You can use a Counter

>>> from collections import Counter
>>> l = [2, 3, 6, 6, 8, 9, 12, 12, 14]
>>> res = [el for el, cnt in Counter(l).items() if cnt==1]
>>> res
[2, 3, 8, 9, 14]
abc
  • 11,579
  • 2
  • 26
  • 51
  • 4
    Counter here is an overkill, `set` will do the job. – mlisthenewcool Sep 20 '20 at 14:59
  • 5
    From the ouputlist he wants elements that only appear once, it's not just removing duplicates @mlisthenewcool – abc Sep 20 '20 at 14:59
  • 5
    @Pac0 I doubt there is a more efficient solution. Counter is O(n). You need to iterate the whole list anyway.. so it can't be better than O(n). – adjan Sep 20 '20 at 15:08
  • I agree that's the cleanest and most pythonic solution. And I'm sure no one cares about those microseconds. – Tom Wojcik Sep 20 '20 at 15:21
  • @Pac0: For a small input, the `O(n²)` cost is unlikely to be noticeable. Asymptotically though, it's guaranteed to lose to `O(n)` solution. Benchmarking the various `O(n)` approaches for such a small input is unlikely to usefully demonstrate scaling (given the setup costs often outweigh the incremental per item costs initially). Unless it's known that the obvious code with acceptable big-O is too slow, just use the obvious code. – ShadowRanger Sep 20 '20 at 17:41
  • @mlisthenewcool set will do the job with more code; using Counter is the cleanest way. – Giuseppe Sep 20 '20 at 21:16
4

You can always have two sets. One to check if seen and another one to keep unique only. set.discard(el) will remove if exists.

Inputlist = [2, 3, 6, 6, 8, 9, 12, 12, 14]

seen = set()
ans = set()

for el in Inputlist:
    if el not in seen:
        seen.add(el)
        ans.add(el)
    else:
        ans.discard(el)

print(list(ans))

EDIT: for giggles I measured the performance of these two solutions

from timeit import timeit


first = """
def get_from_two_sets():
    seen = set()
    ans = set()

    for el in (2, 3, 6, 6, 8, 9, 12, 12, 14):
        if el not in seen:
            seen.add(el)
            ans.add(el)
        else:
            ans.discard(el)"""


second = """

def get_from_counter():
    return [el for el, cnt in Counter((2, 3, 6, 6, 8, 9, 12, 12, 14)).items() if cnt == 1]
    """


print(timeit(stmt=first, number=10000000))
print(timeit(stmt=second, number=10000000, setup="from collections import Counter"))

yields

0.3130729760000577
0.46127468299982866

so yay! it seems like my solution is slightly faster. Don't waste those nanoseconds you saved!

@abc solution is clean and pythonic, go for it.

Tom Wojcik
  • 5,471
  • 4
  • 32
  • 44
  • Note that this solution loses ordering (which abc's solution preserves on modern Python, where `dict`s are insertion-ordered), thanks to using `set`s. You could always use a `dict` in a `set`-like way for `ans` to preserve order, but I agree, `Counter` is obvious, Pythonic, and more than efficient enough (I suspect it would narrow the gap or even win on larger inputs, due to `Counter`'s C-accelerated counting function), just use that. – ShadowRanger Sep 20 '20 at 17:04
2

A simple list comprehension will do the trick:

Inputlist = [2, 3, 6, 6, 8, 9, 12, 12, 14]
 
Outputlist = [item for item in Inputlist if Inputlist.count(item) == 1]
Mushif Ali Nawaz
  • 3,707
  • 3
  • 18
  • 31
Alex Mandelias
  • 436
  • 5
  • 10
2

Another solution using sets: Convert the input list to a set and remove all elements of this set from the input list. This leaves only duplicates in the list. Now convert this to a set and you can subtract one set from another. Sounds complicated, but is quite short and efficient for short lists:

l = [2, 3, 6, 6, 8, 9, 12, 12, 14]
inset = set(l)

for i in inset:   # <-- usually the element to remove is in the front,
    l.remove(i)   # <-- but in a worst case, this is slower than O(n)

result = list(inset - set(l))

irrelevant performance for the short example list:

# %timeit this solution
1.18 µs ± 1.97 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# %timeit solution with seen-set
1.23 µs ± 1.49 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# %timeit solution with Counter class
2.76 µs ± 4.85 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

For a list with 1000 elements and 10% duplicates the Counter-solution is fastest!

Wups
  • 2,489
  • 1
  • 6
  • 17
2

Alternate solution for case where only consecutive duplicates should be removed:

from itertools import groupby

inputlist = [2, 3, 6, 6, 8, 9, 12, 12, 14]

outputlist = [x for _, (x, *extra) in groupby(inputlist) if not extra]

All this does is group together runs of identical values, unpack the first copy into x, and the rest enter a list; we check if said list is empty to determine whether there was just one value, or more than one, and only keep the ones where it was a single value.

If you don't like even the temporary extra list, using one of the ilen solutions that doesn't listify the group would allow a similar solution with no unbounded temporary storage:

outputlist = [x for x, grp in groupby(inputlist) if ilen(grp) == 1]

Or with a helper that just checks "at least 2" without iterating beyond that point:

def more_than_one(it):
    next(it)  # Assumes at least once, which is already the case with groupby groups
    try:
        next(it)
    except StopIteration:
        return True
    return False

outputlist = [x for x, grp in groupby(inputlist) if not more_than_one(grp)]

Note: I'd actually prefer abc's Counter-based solution in general, but if you actually want to only delete adjacent duplicates, it's not adequate to the task.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
1

If input is sorted and can be bounded by a min and a max, this can be done in O(n):

min = -1
max = 99999999  # put whatever you need
J = [min] + I + [max]
[y for (x,y,z) in zip(J, J[1:], J[2:]) if x < y and y < z]
Setop
  • 2,262
  • 13
  • 28