Removing duplicates in lists

Question

How can I check if a list has any duplicates and return a new list without duplicates?

related: [How to use multiprocessing to drop duplicates in a very big list?](https://stackoverflow.com/q/59762414/9059420) — Darkonaut, Jan 31 '20 at 21:43
Interestingly, none of the top answers here provides an answer to the actual question: create a new list with only items that are not duplicated in the original list. I read that as `[1, 2, 3, 4, 5, 2, 4]` -> `[1, 3, 5]`, as 2 and 4 are duplicated. — 9769953, Sep 10 '22 at 10:46
@9769953 Given what you say, would it make sense to use [Rev 11](https://stackoverflow.com/revisions/7961363/11), but keep only the first sub-question (i.e. `[1, 2, 3, 1] → [1, 2, 3]`) that is answered by the top answers? The accepted answer hints at a possible way to accomplish the second sub-question (i.e. `[1, 2, 3, 1] → [2, 3]`). As it stands, the question and top answer are paradoxically not in exactly sync. — Mateen Ulhaq, Sep 12 '22 at 05:07
@MateenUlhaq I prefer to keep the original question. Also, rev. 11 changes the question to fit the answers more, but not necessarily fit the original question. I guess then it depends how much of a forum/mailing list style you'd like SO to be, or how close to a tips'n'tricks website (with very pure questions and answers). I don't think either is achievable. — 9769953, Sep 12 '22 at 07:31
After going back and reading rev 1, I can't fathom how the question could be read as saying anything about whether `[1, 2, 3, 4, 5, 2, 4]` should transform to `[1, 3, 5]` or to `[1, 2, 3, 4, 5]`, or if order matters, or anything else. In fact, despite the title "Python removing duplicates in lists", it doesn't seem like OP wanted to remove duplicates from within the **same** list at all. Rather, it looks like OP wanted to take **two** lists e.g. `[1, 2, 3, 4]` and `[1, 3, 4]`, and remove from the first *those which are present in the second*, to get `[2]`. — Karl Knechtel, Jan 25 '23 at 08:25
In other words, that would have made the question a duplicate of [Remove all the elements that occur in one list from another](https://stackoverflow.com/questions/4211209), which was much better asked from the start. But it seems like almost everyone saw a different question from that. — Karl Knechtel, Jan 25 '23 at 08:26
The answer strongly depends on the use-case. Should the order be maintained or not? — Wör Du Schnaffzig, Aug 29 '23 at 08:37

score 2227 · Accepted Answer · edited Apr 09 '22 at 08:43

2227

The common approach to get a unique collection of items is to use a set. Sets are unordered collections of distinct objects. To create a set from any iterable, you can simply pass it to the built-in set() function. If you later need a real list again, you can similarly pass the set to the list() function.

The following example should cover whatever you are trying to do:

>>> t = [1, 2, 3, 1, 2, 3, 5, 6, 7, 8]
>>> list(set(t))
[1, 2, 3, 5, 6, 7, 8]
>>> s = [1, 2, 3]
>>> list(set(t) - set(s))
[8, 5, 6, 7]

As you can see from the example result, the original order is not maintained. As mentioned above, sets themselves are unordered collections, so the order is lost. When converting a set back to a list, an arbitrary order is created.

Maintaining order

If order is important to you, then you will have to use a different mechanism. A very common solution for this is to rely on OrderedDict to keep the order of keys during insertion:

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(t))
[1, 2, 3, 5, 6, 7, 8]

Starting with Python 3.7, the built-in dictionary is guaranteed to maintain the insertion order as well, so you can also use that directly if you are on Python 3.7 or later (or CPython 3.6):

>>> list(dict.fromkeys(t))
[1, 2, 3, 5, 6, 7, 8]

Note that this may have some overhead of creating a dictionary first, and then creating a list from it. If you don’t actually need to preserve the order, you’re often better off using a set, especially because it gives you a lot more operations to work with. Check out this question for more details and alternative ways to preserve the order when removing duplicates.

Finally note that both the set as well as the OrderedDict/dict solutions require your items to be hashable. This usually means that they have to be immutable. If you have to deal with items that are not hashable (e.g. list objects), then you will have to use a slow approach in which you will basically have to compare every item with every other item in a nested loop.

edited Apr 09 '22 at 08:43

Mateen Ulhaq

24,552
19
101
135

answered Nov 01 '11 at 00:49

poke

369,085
72
557
602

add this to example, t = [3, 2, 1, 1, 2, 5, 6, 7, 8], shows the difference clearly! – sailfish009 Oct 26 '19 at 04:44
1

"...overhead of creating a dictionary first... If you don’t actually need to preserve the order, you’re better off using a set." — I profiled this because I was curious if it was actually true. My timings show that indeed the set is slightly faster: 1.12 µs per loop (set) vs 1.53 µs per loop (dict) over 1M loops with an absolute time difference of about 4s over 1M iterations. So if you're doing this in a tight inner loop you may care, otherwise probably not. – millerdev Dec 09 '19 at 13:30
@millerdev I was going to say something like _“overhead does not only mean timing”_ but then I checked and it appears that a keyed dictionary is actually smaller in memory than a set with the same elements. At least in current versions of Python. That’s really surprising – but yes, it’s a good point! Thanks! – poke Dec 09 '19 at 15:05
4

This solves the issue with *unhashable* types (where t is a list of dicts): `[dict(d) for d in set([frozenset(i.items()) for i in t])]` – Fredrik Erlandsson Dec 11 '19 at 07:52
@FredrikErlandsson Note that this highly depends on the actual shape of your dictionaries. If you have a dictionary that contains unhashable values itself, then this won’t be enough. – poke Dec 11 '19 at 09:05
@poke time complexity of list(dict.fromkeys(t))? – BigDreamz Aug 24 '20 at 13:47
1

@BigDreamz `dict.fromkeys()` creates a dictionary in linear time, and `list()` will create a list from it also in linear time. – poke Aug 25 '20 at 06:16
I'm getting `TypeError: unhashable type: 'list'`. I am trying to deduplicate a list of lists of strings while maintaining order. @FredrikErlandsson's solution didn't resolve it for me. – Kevin Wheeler Oct 13 '22 at 01:22
You cannot put lists in sets since lists are mutable and could change (which could affect whether they are duplicate to another list in the set). I would suggest a different approach for a list of lists, e.g. as covered in [this question about removing duplicates from a list of lists](https://stackoverflow.com/q/2213923/216074). – poke Oct 16 '22 at 12:59

Raymond Hettinger · Answer 2 · 2017-12-22T08:26:41.167

484

In Python 2.7, the new way of removing duplicates from an iterable while keeping it in the original order is:

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

In Python 3.5, the OrderedDict has a C implementation. My timings show that this is now both the fastest and shortest of the various approaches for Python 3.5.

In Python 3.6, the regular dict became both ordered and compact. (This feature is holds for CPython and PyPy but may not present in other implementations). That gives us a new fastest way of deduping while retaining order:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

In Python 3.7, the regular dict is guaranteed to both ordered across all implementations. So, the shortest and fastest solution is:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

edited Dec 22 '17 at 08:26

answered Nov 01 '11 at 00:53

Raymond Hettinger

216,523
63
388
485

11

I think this is the only way to keep the items in order. – Herberth Amaral Oct 22 '12 at 20:23
22

@HerberthAmaral: That is very far from true, see [How do you remove duplicates from a list in Python whilst preserving order?](http://stackoverflow.com/q/480214) – Martijn Pieters Aug 15 '13 at 14:24
5

@MartijnPieters Correcting: I think this is the only *simple* way to keep items in order. – Herberth Amaral Aug 15 '13 at 21:34
16

For this too, the content of the original list must be hashable – Davide Feb 15 '17 at 20:28
As @Davide mentioned, the original list must hashable. This means, that this does not work for a list of dictionaries. `TypeError: unhashable type: 'dictlist'` – CraZ May 16 '18 at 17:27
4

If the original list is not hashable, the [more-itertools](https://pypi.org/project/more-itertools/) package has [`unique_everseen`](https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.unique_everseen) which works with both hashable and unhashable items. – Asclepius Aug 02 '19 at 00:21

9000 · Answer 3 · 2017-06-05T16:39:17.450

217

It's a one-liner: list(set(source_list)) will do the trick.

A set is something that can't possibly have duplicates.

Update: an order-preserving approach is two lines:

from collections import OrderedDict
OrderedDict((x, True) for x in source_list).keys()

Here we use the fact that OrderedDict remembers the insertion order of keys, and does not change it when a value at a particular key is updated. We insert True as values, but we could insert anything, values are just not used. (set works a lot like a dict with ignored values, too.)

edited Jun 05 '17 at 16:39

answered Nov 01 '11 at 00:49

9000

39,899
9
66
104

@AdrianKeister: This is true. There are objects that have reasonable equality semantics but are not hashable, e.g. lists. OTOH if we can't have a shortcut like a hastable, we end up with a quadratic algorithm of just comparing every element with all currently known unique elements. This can be totally OK for short inputs, especially with a lot of duplicates. – 9000 Aug 22 '19 at 15:40
1

Right, exactly. I think your answer would be higher quality if you took this very common use case into account. – Adrian Keister Aug 22 '19 at 15:44

score 119 · Answer 4 · answered May 14 '13 at 12:39

119

>>> t = [1, 2, 3, 1, 2, 5, 6, 7, 8]
>>> t
[1, 2, 3, 1, 2, 5, 6, 7, 8]
>>> s = []
>>> for i in t:
       if i not in s:
          s.append(i)
>>> s
[1, 2, 3, 5, 6, 7, 8]

answered May 14 '13 at 12:39

Neeraj

1,247
1
8
2

50

Note that this method works in O(n^2) time and is thus very slow on large lists. – dotancohen Sep 03 '13 at 14:02

score 106 · Answer 5 · answered Nov 01 '11 at 00:49

106

If you don't care about the order, just do this:

def remove_duplicates(l):
    return list(set(l))

A set is guaranteed to not have duplicates.

answered Nov 01 '11 at 00:49

Brendan Long

53,280
21
146
188

Awesome, simple and effective – Blenzus Nov 22 '22 at 20:03

score 50 · Answer 6 · edited Dec 10 '21 at 06:00

50

To make a new list retaining the order of first elements of duplicates in L:

newlist = [ii for n,ii in enumerate(L) if ii not in L[:n]]

For example: if L = [1, 2, 2, 3, 4, 2, 4, 3, 5], then newlist will be [1, 2, 3, 4, 5]

This checks each new element has not appeared previously in the list before adding it. Also it does not need imports.

edited Dec 10 '21 at 06:00

Tomerikoo

18,379
16
47
61

answered Jul 05 '14 at 03:39

Richard Fredlund

621
5
6

5

This has a time complexity of **O(n ^ 2)**. The answers with `set` and `OrderedDict` may have lower amortized time complexity. – blubberdiblub Apr 13 '17 at 04:09
I used in my code this solution and worked great but I think it is time consuming – Gerasimos Ragavanis Apr 26 '18 at 13:59
@blubberdiblub can you explain what more code efficient mechanism exists in set and OrderedDict that could make them less time consuming? (excluding the overhead of loading them) – ilias iliadis Jan 14 '19 at 11:45
2

@iliasiliadis The usual implementations of **set** and **dict** use hashes or (some form of balanced) trees. You have to consider building the **set** or **dict** and searching in it (multiple times), but their amortized complexity usually is still lower than **O(n ^ 2)**. "Amortized" in simple terms means on average (they can have worst cases with higher complexity than the average case). This is only relevant when you have a big number of items. – blubberdiblub Jan 14 '19 at 13:16
Nice answer, it works if the elements are not hashable. However, if the elements are Numpy arrays, you may get surprises, because the `in` operator doesn't work as one might expect (at least as I was expecting). – Keta Jun 01 '22 at 08:47

G M · Answer 7 · 2017-12-06T10:51:24.703

39

There are also solutions using Pandas and Numpy. They both return numpy array so you have to use the function .tolist() if you want a list.

t=['a','a','b','b','b','c','c','c']
t2= ['c','c','b','b','b','a','a','a']

Pandas solution

Using Pandas function unique():

import pandas as pd
pd.unique(t).tolist()
>>>['a','b','c']
pd.unique(t2).tolist()
>>>['c','b','a']

Numpy solution

Using numpy function unique().

import numpy as np
np.unique(t).tolist()
>>>['a','b','c']
np.unique(t2).tolist()
>>>['a','b','c']

Note that numpy.unique() also sort the values. So the list t2 is returned sorted. If you want to have the order preserved use as in this answer:

_, idx = np.unique(t2, return_index=True)
t2[np.sort(idx)].tolist()
>>>['c','b','a']

The solution is not so elegant compared to the others, however, compared to pandas.unique(), numpy.unique() allows you also to check if nested arrays are unique along one selected axis.

edited Dec 06 '17 at 10:51

answered Jul 03 '14 at 12:45

G M

20,759
10
81
84

This will convert the list to numpy array which is a mess and won't work for strings. – user227666 Jul 03 '14 at 12:48
1

@user227666 thanks for your review but that's not true it works even with string and you can add .tolist if you want to get a list... – G M Jul 03 '14 at 16:45
2

I think this is kinda like trying to kill a bee with a sledgehammer. Works, sure! But, importing a library for just this purpose might be a little overkill, no? – Debosmit Ray Oct 09 '16 at 09:11
@DebosmitRay it could be useful if you work in Data Science where usually you work with numpy and many times you need to work with numpy array. – G M Oct 10 '16 at 07:17
1

the best answer in 2020 @DebosmitRay i hope you change your mind and use numpy / pandas every time you can – Egos Feb 27 '20 at 13:52

Pedro Lobito · Answer 8 · 2023-06-17T21:24:57.633

36

Super late answer.
If you don't care about the list order, you can use *arg expansion with set uniqueness to remove dupes, i.e.:

l = [*{*l}]

Python3 Demo

edited Jun 17 '23 at 21:24

answered Mar 04 '20 at 01:57

Pedro Lobito

94,083
31
258
268

10

Nice... the one problem is that it's so clever that you kind of have to add a comment to say what it does. – mike rodent Oct 17 '21 at 18:39

Corman · Answer 9 · 2021-03-04T02:16:31.213

In this answer, there will be two sections: Two unique solutions, and a graph of speed for specific solutions.

Removing Duplicate Items

Most of these answers only remove duplicate items which are hashable, but this question doesn't imply it doesn't just need hashable items, meaning I'll offer some solutions which don't require hashable items.

collections.Counter is a powerful tool in the standard library which could be perfect for this. There's only one other solution which even has Counter in it. However, that solution is also limited to hashable keys.

To allow unhashable keys in Counter, I made a Container class, which will try to get the object's default hash function, but if it fails, it will try its identity function. It also defines an eq and a hash method. This should be enough to allow unhashable items in our solution. Unhashable objects will be treated as if they are hashable. However, this hash function uses identity for unhashable objects, meaning two equal objects that are both unhashable won't work. I suggest you override this, and changing it to use the hash of an equivalent mutable type (like using hash(tuple(my_list)) if my_list is a list).

I also made two solutions. Another solution which keeps the order of the items, using a subclass of both OrderedDict and Counter which is named 'OrderedCounter'. Now, here are the functions:

from collections import OrderedDict, Counter

class Container:
    def __init__(self, obj):
        self.obj = obj
    def __eq__(self, obj):
        return self.obj == obj
    def __hash__(self):
        try:
            return hash(self.obj)
        except:
            return id(self.obj)

class OrderedCounter(Counter, OrderedDict):
     'Counter that remembers the order elements are first encountered'

     def __repr__(self):
         return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))

     def __reduce__(self):
         return self.__class__, (OrderedDict(self),)
    
def remd(sequence):
    cnt = Counter()
    for x in sequence:
        cnt[Container(x)] += 1
    return [item.obj for item in cnt]

def oremd(sequence):
    cnt = OrderedCounter()
    for x in sequence:
        cnt[Container(x)] += 1
    return [item.obj for item in cnt]

remd is non-ordered sorting, while oremd is ordered sorting. You can clearly tell which one is faster, but I'll explain anyways. The non-ordered sorting is slightly faster, since it doesn't store the order of the items.

Now, I also wanted to show the speed comparisons of each answer. So, I'll do that now.

Which Function is the Fastest?

For removing duplicates, I gathered 10 functions from a few answers. I calculated the speed of each function and put it into a graph using matplotlib.pyplot.

I divided this into three rounds of graphing. A hashable is any object which can be hashed, an unhashable is any object which cannot be hashed. An ordered sequence is a sequence which preserves order, an unordered sequence does not preserve order. Now, here are a few more terms:

Unordered Hashable was for any method which removed duplicates, which didn't necessarily have to keep the order. It didn't have to work for unhashables, but it could.

Ordered Hashable was for any method which kept the order of the items in the list, but it didn't have to work for unhashables, but it could.

Ordered Unhashable was any method which kept the order of the items in the list, and worked for unhashables.

On the y-axis is the amount of seconds it took.

On the x-axis is the number the function was applied to.

I generated sequences for unordered hashables and ordered hashables with the following comprehension: [list(range(x)) + list(range(x)) for x in range(0, 1000, 10)]

For ordered unhashables: [[list(range(y)) + list(range(y)) for y in range(x)] for x in range(0, 1000, 10)]

Note there is a step in the range because without it, this would've taken 10x as long. Also because in my personal opinion, I thought it might've looked a little easier to read.

Also note the keys on the legend are what I tried to guess as the most vital parts of the implementation of the function. As for what function does the worst or best? The graph speaks for itself.

With that settled, here are the graphs.

Unordered Hashables

(Zoomed in)

Ordered Hashables

(Zoomed in)

Ordered Unhashables

(Zoomed in)

Hard to read. Better have a top list at the bottom with the results wrapped up. Thus, for unordered hashables: **Do not use:** #- ii for n,ii in enumerate(seq) if ii not in seq[:n] #- cnt = Counter(); cnt[Container(x)] += 1 #- cnt = OrderedCounter(); cnt[Container(x)) += 1 #- if i not in new for i in seq. **Better use:** #- list(set(seq)) #- dict.fromkeys(seq) #- added = set(); for in seq: if not val in added #- OrderedDict.fromkeys(seq) #- OrderedDict((x, True) for x in seq).keys() #- functools.reduce(lambda r, v: v in r[1] and r or ... or ..., ([], set[]))[0] — questionto42, Sep 13 '21 at 23:12

score 31 · Answer 10 · answered Sep 17 '14 at 09:52

A colleague have sent the accepted answer as part of his code to me for a codereview today. While I certainly admire the elegance of the answer in question, I am not happy with the performance. I have tried this solution (I use set to reduce lookup time)

def ordered_set(in_list):
    out_list = []
    added = set()
    for val in in_list:
        if not val in added:
            out_list.append(val)
            added.add(val)
    return out_list

To compare efficiency, I used a random sample of 100 integers - 62 were unique

from random import randint
x = [randint(0,100) for _ in xrange(100)]

In [131]: len(set(x))
Out[131]: 62

Here are the results of the measurements

In [129]: %timeit list(OrderedDict.fromkeys(x))
10000 loops, best of 3: 86.4 us per loop

In [130]: %timeit ordered_set(x)
100000 loops, best of 3: 15.1 us per loop

Well, what happens if set is removed from the solution?

def ordered_set(inlist):
    out_list = []
    for val in inlist:
        if not val in out_list:
            out_list.append(val)
    return out_list

The result is not as bad as with the OrderedDict, but still more than 3 times of the original solution

In [136]: %timeit ordered_set(x)
10000 loops, best of 3: 52.6 us per loop

Nice using set quick lookup to speed up the looped comparison. If order does not matter list(set(x)) is still 6x faster than this — Joop, Sep 17 '14 at 10:24
@Joop, that was my first question for my colleague - the order does matter; otherwise, it would have been trivial issue — volcano, Sep 17 '14 at 11:00
optimized version of ordered set, for anyone who is interested: `def unique(iterable):` ;`seen = set()`; `seen_add = seen.add`; `return [item for item in iterable if not item in seen and not seen_add(item)]` — DrD, Feb 16 '20 at 22:29

score 24 · Answer 11 · edited Dec 03 '16 at 03:23

24

Another way of doing:

>>> seq = [1,2,3,'a', 'a', 1,2]
>> dict.fromkeys(seq).keys()
['a', 1, 2, 3]

edited Dec 03 '16 at 03:23

Russia Must Remove Putin

374,368
89
403
331

answered Jan 01 '14 at 15:39

James Sapam

16,036
12
50
73

1

Note that in modern Python versions (2.7+ I think, but I don't recall for sure), `keys()` returns a dictionary view object, not a list. – Dustin Wyatt Dec 22 '17 at 15:24

score 23 · Answer 12 · answered Apr 14 '15 at 23:33

23

Simple and easy:

myList = [1, 2, 3, 1, 2, 5, 6, 7, 8]
cleanlist = []
[cleanlist.append(x) for x in myList if x not in cleanlist]

Output:

>>> cleanlist 
[1, 2, 3, 5, 6, 7, 8]

answered Apr 14 '15 at 23:33

Nima Soroush

12,242
4
52
53

5

quadratic complexity nonetheless - `in` is O(n) operation and your `cleanlist` will have at most `n` numbers => worst-case ~O(n^2) – jermenkoo Mar 23 '16 at 23:02
7

list comprehensions shouldn't be used for side effects. – Jean-François Fabre Dec 07 '18 at 22:09

score 16 · Answer 13 · edited Oct 27 '14 at 10:58

16

I had a dict in my list, so I could not use the above approach. I got the error:

TypeError: unhashable type:

So if you care about order and/or some items are unhashable. Then you might find this useful:

def make_unique(original_list):
    unique_list = []
    [unique_list.append(obj) for obj in original_list if obj not in unique_list]
    return unique_list

Some may consider list comprehension with a side effect to not be a good solution. Here's an alternative:

def make_unique(original_list):
    unique_list = []
    map(lambda x: unique_list.append(x) if (x not in unique_list) else False, original_list)
    return unique_list

edited Oct 27 '14 at 10:58

Bernhard Barker

54,589
14
104
138

answered Jun 06 '14 at 15:25

cchristelis

1,985
1
13
17

6

`map` with a side effect is even more misleading than a listcomp with a side effect. Also, `lambda x: unique_list.append(x)` is just a clunkier and slower way to pass `unique_list.append`. – abarnert Nov 08 '14 at 01:48
Very useful way to append elements in just one line, thanks! – ZLNK May 24 '17 at 21:50
2

@ZLNK please, don't ever use that. Apart from being conceptually ugly, it's also extremely inefficient, because you actually create a potentially large list and throw it away just to perform basic iteration. – Eli Korvigo Mar 13 '19 at 20:14

HEEL_caT666 · Answer 14 · 2019-02-23T22:57:52.883

If you want to preserve the order, and not use any external modules here is an easy way to do this:

>>> t = [1, 9, 2, 3, 4, 5, 3, 6, 7, 5, 8, 9]
>>> list(dict.fromkeys(t))
[1, 9, 2, 3, 4, 5, 6, 7, 8]

Note: This method preserves the order of appearance, so, as seen above, nine will come after one because it was the first time it appeared. This however, is the same result as you would get with doing

from collections import OrderedDict
ulist=list(OrderedDict.fromkeys(l))

but it is much shorter, and runs faster.

This works because each time the fromkeys function tries to create a new key, if the value already exists it will simply overwrite it. This wont affect the dictionary at all however, as fromkeys creates a dictionary where all keys have the value None, so effectively it eliminates all duplicates this way.

Also try it out [here](https://www.w3schools.com/python/python_howto_remove_duplicates.asp) — vineeshvs, May 02 '19 at 10:30

Eli Korvigo · Answer 15 · 2018-02-12T14:59:10.933

13

All the order-preserving approaches I've seen here so far either use naive comparison (with O(n^2) time-complexity at best) or heavy-weight OrderedDicts/set+list combinations that are limited to hashable inputs. Here is a hash-independent O(nlogn) solution:

Update added the key argument, documentation and Python 3 compatibility.

# from functools import reduce <-- add this import on Python 3

def uniq(iterable, key=lambda x: x):
    """
    Remove duplicates from an iterable. Preserves order. 
    :type iterable: Iterable[Ord => A]
    :param iterable: an iterable of objects of any orderable type
    :type key: Callable[A] -> (Ord => B)
    :param key: optional argument; by default an item (A) is discarded 
    if another item (B), such that A == B, has already been encountered and taken. 
    If you provide a key, this condition changes to key(A) == key(B); the callable 
    must return orderable objects.
    """
    # Enumerate the list to restore order lately; reduce the sorted list; restore order
    def append_unique(acc, item):
        return acc if key(acc[-1][1]) == key(item[1]) else acc.append(item) or acc 
    srt_enum = sorted(enumerate(iterable), key=lambda item: key(item[1]))
    return [item[1] for item in sorted(reduce(append_unique, srt_enum, [srt_enum[0]]))]

edited Feb 12 '18 at 14:59

answered Jan 13 '16 at 19:12

Eli Korvigo

10,265
6
47
73

Yet, this solution requires orderable elements. I will use it uniquify my list of lists: it is a pain to `tuple()` lists and to hash them. | | | | - Generally speaking, the hash process takes a time proportional to the size of the whole data, while this solution takes a time O(nlog(n)), depending only on the length of the list. – loxaxs May 18 '16 at 20:40
I think that the set-based approach is equally cheap (O(n log n)), or cheaper, than sorting + detection of uniques. (This approach would parallelize much better, though.) It also does not exactly preserve the initial order, but it gives a predictable order. – 9000 Jun 05 '17 at 16:29
@9000 That is true. I've never mentioned time-complexity of a hash-table-based approach, which is obviously O(n). Here you can find many answers incorporating hash-tables. They are not universal, though, because they require objects to be hashable. Moreover, they are a lot more memory-intensive. – Eli Korvigo Jun 06 '17 at 17:34
Takes time to read and understand this answer. Is there a point in enumerating when you are not using the indices? The `reduce()` is already working on a sorted collection `srt_enum`, why did you apply `sorted` again? – Brayoni May 01 '20 at 11:09
@Brayoni the first sort is there to group equal values, the second sort is there to restore initial order. The enumeration is needed to keep track of original relative order. – Eli Korvigo May 01 '20 at 13:30

score 12 · Answer 16 · answered Aug 05 '21 at 15:38

I've compared the various suggestions with perfplot. It turns out that, if the input array doesn't have duplicate elements, all methods are more or less equally fast, independently of whether the input data is a Python list or a NumPy array.

If the input array is large, but contains just one unique element, then the set, dict and np.unique methods are costant-time if the input data is a list. If it's a NumPy array, np.unique is about 10 times faster than the other alternatives.

It's somewhat surprising to me that those are not constant-time operations, too.

Code to reproduce the plots:

import perfplot
import numpy as np
import matplotlib.pyplot as plt


def setup_list(n):
    # return list(np.random.permutation(np.arange(n)))
    return [0] * n


def setup_np_array(n):
    # return np.random.permutation(np.arange(n))
    return np.zeros(n, dtype=int)


def list_set(data):
    return list(set(data))


def numpy_unique(data):
    return np.unique(data)


def list_dict(data):
    return list(dict.fromkeys(data))


b = perfplot.bench(
    setup=[
        setup_list,
        setup_list,
        setup_list,
        setup_np_array,
        setup_np_array,
        setup_np_array,
    ],
    kernels=[list_set, numpy_unique, list_dict, list_set, numpy_unique, list_dict],
    labels=[
        "list(set(lst))",
        "np.unique(lst)",
        "list(dict(lst))",
        "list(set(arr))",
        "np.unique(arr)",
        "list(dict(arr))",
    ],
    n_range=[2 ** k for k in range(23)],
    xlabel="len(array)",
    equality_check=None,
)
# plt.title("input array = [0, 1, 2,..., n]")
plt.title("input array = [0, 0,..., 0]")
b.save("out.png")
b.show()

score 10 · Answer 17 · answered Jun 06 '17 at 09:12

10

You could also do this:

>>> t = [1, 2, 3, 3, 2, 4, 5, 6]
>>> s = [x for i, x in enumerate(t) if i == t.index(x)]
>>> s
[1, 2, 3, 4, 5, 6]

The reason that above works is that index method returns only the first index of an element. Duplicate elements have higher indices. Refer to here:

list.index(x[, start[, end]])
Return zero-based index in the list of the first item whose value is x. Raises a ValueError if there is no such item.

answered Jun 06 '17 at 09:12

Atonal

520
4
14

This is horribly inefficient. `list.index` is a linear-time operation, making your solution quadratic. – Eli Korvigo Apr 13 '18 at 20:42
You're right. But also I believe it's fairly obvious the solution is intended to be a one liner that preserves the order. Everything else is already in here. – Atonal Oct 13 '18 at 00:08

score 10 · Answer 18 · answered Aug 17 '17 at 07:39

10

Best approach of removing duplicates from a list is using set() function, available in python, again converting that set into list

In [2]: some_list = ['a','a','v','v','v','c','c','d']
In [3]: list(set(some_list))
Out[3]: ['a', 'c', 'd', 'v']

answered Aug 17 '17 at 07:39

Anurag Misra

1,516
18
24

@MeetZaveri glad.! – Anurag Misra May 02 '18 at 05:53
Instantiating new lists and sets is not free. What happens if we do this many times in quick succession (ie. in a very tight loop), and the lists are very small? – Z4-tier Dec 24 '19 at 11:47

score 10 · Answer 19 · answered Sep 18 '18 at 12:56

10

You can use set to remove duplicates:

mylist = list(set(mylist))

But note the results will be unordered. If that's an issue:

mylist.sort()

answered Sep 18 '18 at 12:56

Flavio Wuensche

9,460
1
57
54

1

You can just do: mylist = sorted(list(set(mylist))) – Erik Campobadal Jan 03 '19 at 13:39

score 8 · Answer 20 · answered Nov 01 '11 at 00:54

8

Try using sets:

import sets
t = sets.Set(['a', 'b', 'c', 'd'])
t1 = sets.Set(['a', 'b', 'c'])

print t | t1
print t - t1

answered Nov 01 '11 at 00:54

Charlie Martin

110,348
25
193
263

score 7 · Answer 21 · edited Aug 02 '21 at 16:31

7

This one cares about the order without too much hassle (OrderdDict & others). Probably not the most Pythonic way, nor shortest way, but does the trick:

def remove_duplicates(item_list):
    ''' Removes duplicate items from a list '''
    singles_list = []
    for element in item_list:
        if element not in singles_list:
            singles_list.append(element)
    return singles_list

edited Aug 02 '21 at 16:31

Asclepius

57,944
17
167
143

answered Sep 02 '14 at 11:37

cgf

3,369
7
45
65

1. You should never shadow builtin names (at least, as important as `list`); 2. Your method scales extremely bad: it is quadratic in the number of elements in `list`. – Eli Korvigo Jan 07 '18 at 19:05
1

1. Correct, but this was an example; 2. Correct, and that's exactly the reason why I offered it. All solutions posted here have pros and cons. Some sacrifice simplicity or order, mine sacrifices scalability. – cgf Mar 20 '18 at 11:45

score 7 · Answer 22 · answered Oct 06 '18 at 05:46

7

One more better approach could be,

import pandas as pd

myList = [1, 2, 3, 1, 2, 5, 6, 7, 8]
cleanList = pd.Series(myList).drop_duplicates().tolist()
print(cleanList)

#> [1, 2, 3, 5, 6, 7, 8]

and the order remains preserved.

answered Oct 06 '18 at 05:46

Akarsh Jain

930
10
15

2

Though this might work well, using a heavy library like _pandas_ for this purpose seems like an overkill. – Glutexo Mar 20 '19 at 12:29

Sergey Bershadsky · Answer 23 · 2015-04-27T14:56:35.767

Reduce variant with ordering preserve:

Assume that we have list:

l = [5, 6, 6, 1, 1, 2, 2, 3, 4]

Reduce variant (unefficient):

>>> reduce(lambda r, v: v in r and r or r + [v], l, [])
[5, 6, 1, 2, 3, 4]

5 x faster but more sophisticated

>>> reduce(lambda r, v: v in r[1] and r or (r[0].append(v) or r[1].add(v)) or r, l, ([], set()))[0]
[5, 6, 1, 2, 3, 4]

Explanation:

default = (list(), set())
# user list to keep order
# use set to make lookup faster

def reducer(result, item):
    if item not in result[1]:
        result[0].append(item)
        result[1].add(item)
    return result

reduce(reducer, l, default)[0]

score 6 · Answer 24 · answered Aug 25 '15 at 23:51

There are many other answers suggesting different ways to do this, but they're all batch operations, and some of them throw away the original order. That might be okay depending on what you need, but if you want to iterate over the values in the order of the first instance of each value, and you want to remove the duplicates on-the-fly versus all at once, you could use this generator:

def uniqify(iterable):
    seen = set()
    for item in iterable:
        if item not in seen:
            seen.add(item)
            yield item

This returns a generator/iterator, so you can use it anywhere that you can use an iterator.

for unique_item in uniqify([1, 2, 3, 4, 3, 2, 4, 5, 6, 7, 6, 8, 8]):
    print(unique_item, end=' ')

print()

Output:

1 2 3 4 5 6 7 8

If you do want a list, you can do this:

unique_list = list(uniqify([1, 2, 3, 4, 3, 2, 4, 5, 6, 7, 6, 8, 8]))

print(unique_list)

Output:

[1, 2, 3, 4, 5, 6, 7, 8]

`seen = set(iterable); for item in seen: yield item` is almost certainly faster. (I haven't tried this specific case, but that would be my guess.) — dylnmc, Sep 23 '16 at 18:40
@dylnmc, that's a batch operation, and it also loses the ordering. My answer was specifically intended to be on-the-fly and in order of first occurrence. :) — Cyphase, Oct 26 '16 at 04:42

score 6 · Answer 25 · answered Oct 23 '18 at 18:57

6

You can use the following function:

def rem_dupes(dup_list): 
    yooneeks = [] 
    for elem in dup_list: 
        if elem not in yooneeks: 
            yooneeks.append(elem) 
    return yooneeks

Example:

my_list = ['this','is','a','list','with','dupicates','in', 'the', 'list']

Usage:

rem_dupes(my_list)

['this', 'is', 'a', 'list', 'with', 'dupicates', 'in', 'the']

answered Oct 23 '18 at 18:57

Cybernetic

12,628
16
93
132

Unsuitable for large lists as it creates a duplicate. – ingyhere Mar 29 '21 at 16:52
@ingyhere The OP did not suggest anything re: large lists. There is a *always* a tradeoff to every type of implementation, so the premise that every answer must default to "most scalable" is false. – Cybernetic Mar 29 '21 at 16:59

score 5 · Answer 26 · answered Jul 29 '17 at 00:39

5

Using set :

a = [0,1,2,3,4,3,3,4]
a = list(set(a))
print a

Using unique :

import numpy as np
a = [0,1,2,3,4,3,3,4]
a = np.unique(a).tolist()
print a

answered Jul 29 '17 at 00:39

Nurul Akter Towhid

3,046
2
33
35

score 5 · Answer 27 · answered Oct 12 '17 at 10:28

5

Without using set

data=[1, 2, 3, 1, 2, 5, 6, 7, 8]
uni_data=[]
for dat in data:
    if dat not in uni_data:
        uni_data.append(dat)

print(uni_data)

answered Oct 12 '17 at 10:28

Suresh Gupta

605
7
4

score 5 · Answer 28 · answered Sep 18 '19 at 01:43

The Magic of Python Built-in type

In python, it is very easy to process the complicated cases like this and only by python's built-in type.

Let me show you how to do !

Method 1: General Case

The way (1 line code) to remove duplicated element in list and still keep sorting order

line = [1, 2, 3, 1, 2, 5, 6, 7, 8]
new_line = sorted(set(line), key=line.index) # remove duplicated element
print(new_line)

You will get the result

[1, 2, 3, 5, 6, 7, 8]

Method 2: Special Case

TypeError: unhashable type: 'list'

The special case to process unhashable (3 line codes)

line=[['16.4966155686595', '-27.59776154691', '52.3786295521147']
,['16.4966155686595', '-27.59776154691', '52.3786295521147']
,['17.6508629295574', '-27.143305738671', '47.534955022564']
,['17.6508629295574', '-27.143305738671', '47.534955022564']
,['18.8051102904552', '-26.688849930432', '42.6912804930134']
,['18.8051102904552', '-26.688849930432', '42.6912804930134']
,['19.5504702331098', '-26.205884452727', '37.7709192714727']
,['19.5504702331098', '-26.205884452727', '37.7709192714727']
,['20.2929416861422', '-25.722717575124', '32.8500163147157']
,['20.2929416861422', '-25.722717575124', '32.8500163147157']]

tuple_line = [tuple(pt) for pt in line] # convert list of list into list of tuple
tuple_new_line = sorted(set(tuple_line),key=tuple_line.index) # remove duplicated element
new_line = [list(t) for t in tuple_new_line] # convert list of tuple into list of list

print (new_line)

You will get the result :

[
  ['16.4966155686595', '-27.59776154691', '52.3786295521147'], 
  ['17.6508629295574', '-27.143305738671', '47.534955022564'], 
  ['18.8051102904552', '-26.688849930432', '42.6912804930134'], 
  ['19.5504702331098', '-26.205884452727', '37.7709192714727'], 
  ['20.2929416861422', '-25.722717575124', '32.8500163147157']
]

Because tuple is hashable and you can convert data between list and tuple easily

score 4 · Answer 29 · answered Aug 13 '15 at 21:54

4

below code is simple for removing duplicate in list

def remove_duplicates(x):
    a = []
    for i in x:
        if i not in a:
            a.append(i)
    return a

print remove_duplicates([1,2,2,3,3,4])

it returns [1,2,3,4]

answered Aug 13 '15 at 21:54

vinay hegde

99
1
5

2

If you don't care about order, then this takes significantly longer. `list(set(..))` (over 1 million passes) will beat this solution by about 10 whole seconds - whereas this approach takes about 12 seconds, `list(set(..))` only takes about 2 seconds! – dylnmc Sep 23 '16 at 18:35
@dylnmc this is also a duplicate of a significantly older [answer](https://stackoverflow.com/a/25622503/3846213) – Eli Korvigo Jan 07 '18 at 19:07

thodnev · Answer 30 · 2017-04-01T19:56:01.153

Here's the fastest pythonic solution comaring to others listed in replies.

Using implementation details of short-circuit evaluation allows to use list comprehension, which is fast enough. visited.add(item) always returns None as a result, which is evaluated as False, so the right-side of or would always be the result of such an expression.

Time it yourself

def deduplicate(sequence):
    visited = set()
    adder = visited.add  # get rid of qualification overhead
    out = [adder(item) or item for item in sequence if item not in visited]
    return out

Wariored · Answer 31 · 2018-08-12T16:16:39.317

4

Very simple way in Python 3:

>>> n = [1, 2, 3, 4, 1, 1]
>>> n
[1, 2, 3, 4, 1, 1]
>>> m = sorted(list(set(n)))
>>> m
[1, 2, 3, 4]

edited Aug 12 '18 at 16:16

answered Jun 20 '18 at 12:45

Wariored

1,303
14
25

2

`sorted(list(...))` is redundant (`sorted` already implicitly converts its argument to a new `list`, sorts it, then returns the new `list`, so using both means making an unnecessary temporary `list`). Use only `list` if the result need not be sorted, use only `sorted` if the result needs to be sorted. – ShadowRanger Jun 20 '18 at 12:57

score 4 · Answer 32 · answered Aug 28 '18 at 23:08

Unfortunately. Most answers here either do not preserve the order or are too long. Here is a simple, order preserving answer.

s = [1,2,3,4,5,2,5,6,7,1,3,9,3,5]
x=[]

[x.append(i) for i in s if i not in x]
print(x)

This will give you x with duplicates removed but preserving the order.

score 2 · Answer 33 · answered Jun 09 '14 at 10:33

Here is an example, returning list without repetiotions preserving order. Does not need any external imports.

def GetListWithoutRepetitions(loInput):
    # return list, consisting of elements of list/tuple loInput, without repetitions.
    # Example: GetListWithoutRepetitions([None,None,1,1,2,2,3,3,3])
    # Returns: [None, 1, 2, 3]

    if loInput==[]:
        return []

    loOutput = []

    if loInput[0] is None:
        oGroupElement=1
    else: # loInput[0]<>None
        oGroupElement=None

    for oElement in loInput:
        if oElement<>oGroupElement:
            loOutput.append(oElement)
            oGroupElement = oElement
    return loOutput

score 2 · Answer 34 · answered Nov 20 '15 at 15:20

2

Check this if you want to remove duplicates (in-place edit rather than returning new list) without using inbuilt set, dict.keys, uniqify, counter

>>> t = [1, 2, 3, 1, 2, 5, 6, 7, 8]
>>> for i in t:
...     if i in t[t.index(i)+1:]:
...         t.remove(i)
... 
>>> t
[3, 1, 2, 5, 6, 7, 8]

answered Nov 20 '15 at 15:20

Ravi

157
2
16

Use `enumerate()` to get the index faster: `for i, value in enumerate(t): if value in t[i + 1:]: t.remove(value)` – Martijn Pieters Mar 09 '16 at 13:36

score 2 · Answer 35 · answered Jul 29 '17 at 00:33

2

I think converting to set is the easiest way to remove duplicate:

list1 = [1,2,1]
list1 = list(set(list1))
print list1

answered Jul 29 '17 at 00:33

whackamadoodle3000 · Answer 36 · 2023-05-29T00:30:39.537

2

def remove_duplicates(A):
   [A.pop(idx) for idx,elem in enumerate(A) if A.count(elem)!=1]
   return A

A list comprehesion to remove duplicates

edited May 29 '23 at 00:30

answered Aug 26 '17 at 23:23

whackamadoodle3000

6,684
4
27
44

Love the simplicity, for small lists this will work fine. I'd be tempted to not use 'count' for the index, but eg 'idx'. like: [A.pop(idx) for idx,elem in enumerate(A) if A.count(elem)!=1] – Bert Bril May 20 '23 at 20:20

Brent · Answer 37 · 2020-09-07T01:44:07.410

I didn't see answers for non-hashable values, one liner, n log n, standard-library only, so here's my answer:

list(map(operator.itemgetter(0), itertools.groupby(sorted(items))))

Or as a generator function:

def unique(items: Iterable[T]) -> Iterable[T]:
    """For unhashable items (can't use set to unique) with a partial order"""
    yield from map(operator.itemgetter(0), itertools.groupby(sorted(items)))

score 1 · Answer 38 · answered Aug 25 '15 at 08:38

To remove the duplicates, make it a SET and then again make it a LIST and print/use it. A set is guaranteed to have unique elements. For example :

a = [1,2,3,4,5,9,11,15]
b = [4,5,6,7,8]
c=a+b
print c
print list(set(c)) #one line for getting unique elements of c

The output will be as follows (checked in python 2.7)

[1, 2, 3, 4, 5, 9, 11, 15, 4, 5, 6, 7, 8]  #simple list addition with duplicates
[1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 15] #duplicates removed!!

MSeifert · Answer 39 · 2018-01-18T09:31:01.377

It requires installing a 3rd-party module but the package iteration_utilities contains a unique_everseen¹ function that can remove all duplicates while preserving the order:

>>> from iteration_utilities import unique_everseen

>>> list(unique_everseen(['a', 'b', 'c', 'd'] + ['a', 'c', 'd']))
['a', 'b', 'c', 'd']

In case you want to avoid the overhead of the list addition operation you can use itertools.chain instead:

>>> from itertools import chain
>>> list(unique_everseen(chain(['a', 'b', 'c', 'd'], ['a', 'c', 'd'])))
['a', 'b', 'c', 'd']

The unique_everseen also works if you have unhashable items (for example lists) in the lists:

>>> from iteration_utilities import unique_everseen
>>> list(unique_everseen([['a'], ['b'], 'c', 'd'] + ['a', 'c', 'd']))
[['a'], ['b'], 'c', 'd', 'a']

However that will be (much) slower than if the items are hashable.

¹ Disclosure: I'm the author of the iteration_utilities-library.

score 1 · Answer 40 · answered Aug 18 '17 at 11:11

You can do this simply by using sets.

Step1: Get Different elements of lists
Step2 Get Common elements of lists
Step3 Combine them

In [1]: a = ["apples", "bananas", "cucumbers"]

In [2]: b = ["pears", "apples", "watermelons"]

In [3]: set(a).symmetric_difference(b).union(set(a).intersection(b))
Out[3]: {'apples', 'bananas', 'cucumbers', 'pears', 'watermelons'}

score 1 · Answer 41 · answered Sep 18 '17 at 06:19

If you don't care about order and want something different than the pythonic ways suggested above (that is, it can be used in interviews) then :

def remove_dup(arr):
    size = len(arr)
    j = 0    # To store index of next unique element
    for i in range(0, size-1):
        # If current element is not equal
        # to next element then store that
        # current element
        if(arr[i] != arr[i+1]):
            arr[j] = arr[i]
            j+=1

    arr[j] = arr[size-1] # Store the last element as whether it is unique or repeated, it hasn't stored previously

    return arr[0:j+1]

if __name__ == '__main__':
    arr = [10, 10, 1, 1, 1, 3, 3, 4, 5, 6, 7, 8, 8, 9]
    print(remove_dup(sorted(arr)))

Time Complexity : O(n)

Auxiliary Space : O(n)

Reference: http://www.geeksforgeeks.org/remove-duplicates-sorted-array/

score 1 · Answer 42 · answered Dec 19 '17 at 11:10

There are a lot of answers here that use a set(..) (which is fast given the elements are hashable), or a list (which has the downside that it results in an O(n²) algorithm.

The function I propose is a hybrid one: we use a set(..) for items that are hashable, and a list(..) for the ones that are not. Furthermore it is implemented as a generator such that we can for instance limit the number of items, or do some additional filtering.

Finally we also can use a key argument to specify in what way the elements should be unique. For instance we can use this if we want to filter a list of strings such that every string in the output has a different length.

def uniq(iterable, key=lambda x: x):
    seens = set()
    seenl = []
    for item in iterable:
        k = key(item)
        try:
            seen = k in seens
        except TypeError:
            seen = k in seenl
        if not seen:
            yield item
            try:
                seens.add(k)
            except TypeError:
                seenl.append(k)

We can now for instance use this like:

>>> list(uniq(["apple", "pear", "banana", "lemon"], len))
['apple', 'pear', 'banana']
>>> list(uniq(["apple", "pear", "lemon", "banana"], len))
['apple', 'pear', 'banana']
>>> list(uniq(["apple", "pear", {}, "lemon", [], "banana"], len))
['apple', 'pear', {}, 'banana']
>>> list(uniq(["apple", "pear", {}, "lemon", [], "banana"]))
['apple', 'pear', {}, 'lemon', [], 'banana']
>>> list(uniq(["apple", "pear", {}, "lemon", {}, "banana"]))
['apple', 'pear', {}, 'lemon', 'banana']

It is thus a uniqeness filter that can work on any iterable and filter out uniques, regardless whether these are hashable or not.

It makes one assumption: that if one object is hashable, and another one is not, the two objects are never equal. This can strictly speaking happen, although it would be very uncommon.

A note: There are built-ins that will break the assumption stated in the final paragraph; `frozenset` is hashable, `set` is not, and if they have the same values, they're equal, but you'll treat them as non-equal in this code. — ShadowRanger, Jun 20 '18 at 13:02
@ShadowRanger: yes, I agree with that, like said it does not solve *all* the problems. Nevertheless, by using a `set(..)` this will simply not work at all, and by using a `list`, this will result in linear lookup time. So it is meant as a "better" set, but with some pitfalls. — Willem Van Onsem, Jun 20 '18 at 13:04
Furhermore a `set(..)` also in rare cases returns objects that are not equal. For example `math.nan` is not equal to `math.nan`, but the dictionary will return it, since it checks first for *reference equality*. — Willem Van Onsem, Jun 20 '18 at 13:05
Instead of using `key=lambda x: x`, you could use `key=None` and put in `k = key(item) if key else item`. This should be a tiny bit faster and yield the same result. — Timothy C. Quinn, May 29 '22 at 16:42

score 1 · Answer 43 · answered Feb 22 '18 at 17:10

1

Another solution might be the following. Create a dictionary out of the list with item as key and index as value, and then print the dictionary keys.

>>> lst = [1, 3, 4, 2, 1, 21, 1, 32, 21, 1, 6, 5, 7, 8, 2]
>>>
>>> dict_enum = {item:index for index, item in enumerate(lst)}
>>> print dict_enum.keys()
[32, 1, 2, 3, 4, 5, 6, 7, 8, 21]

answered Feb 22 '18 at 17:10

SuperNova

25,512
7
93
64

Why compute/store the index if you never use it? This looks like a solution intended to preserve order (by storing last seen index of each value) that forgot to do so. `list(set(lst))` would achieve the same logical result. – ShadowRanger Jun 20 '18 at 13:00
You could just do `list(dict.fromkeys(lst))` – Brayoni May 01 '20 at 10:55

score 1 · Answer 44 · answered May 15 '20 at 10:01

1

I did this with pure python function. This works when your items value is JSON.

[i for n, i in enumerate(items) if i not in items[n + 1 :]]

answered May 15 '20 at 10:01

Zhong Ri

2,556
1
19
23

score 1 · Answer 45 · edited Jul 15 '21 at 07:58

You can remove duplicates using a Python set or the dict.fromkeys() method.
The dict.fromkeys() method converts a list into a dictionary. Dictionaries cannot contain duplicate values so a dictionary with only unique values is returned by dict.fromkeys().
Sets, like dictionaries, cannot contain duplicate values. If we convert a list to a set, all the duplicates are removed.

Method 1: The naive approach

mylist = [5, 10, 15, 20, 3, 15, 25, 20, 30, 10, 100]

uniques = []

for i in mylist:

    if i not in uniques:

       uniques.append(i)

print(uniques)

Method 2: Using set()

mylist = [5, 10, 15, 20, 3, 15, 25, 20, 30, 10, 100]

myset = set(mylist)

print(list(myset))

Passing via set() preserve list order? – Guido Nov 30 '21 at 11:30 — Guido, Nov 30 '21 at 11:30

score 1 · Answer 46 · answered Dec 23 '21 at 14:10

1

Using set, but preserving order

unique = set()
[unique.add(n) or n for n in l if n not in unique]

answered Dec 23 '21 at 14:10

Santhosh

28,097
9
82
87

score 0 · Answer 47 · answered Mar 09 '17 at 11:50

0

For completeness, and since this is a very popular question, the toolz library offers a unique function:

>>> tuple(unique((1, 2, 3)))
(1, 2, 3)
>>> tuple(unique((1, 2, 1, 3)))
(1, 2, 3)

answered Mar 09 '17 at 11:50

Björn Pollex

75,346
28
201
283

score 0 · Answer 48 · answered Mar 24 '18 at 16:39

0

def remove_duplicates(input_list):
  if input_list == []:
    return []
  #sort list from smallest to largest
  input_list=sorted(input_list)
  #initialize ouput list with first element of the       sorted input list
  output_list = [input_list[0]]
  for item in input_list:
    if item >output_list[-1]:
      output_list.append(item)
  return output_list

answered Mar 24 '18 at 16:39

dennohpeter

391
4
16

Rather than putting the row code in this way, can you explain what your code does? – DaFois Mar 24 '18 at 17:02

score 0 · Answer 49 · answered Sep 13 '18 at 07:44

0

this is just a readable funtion ,easily understandable ,and i have used the dict data structure,i have used some builtin funtions and a better complexity of O(n)

def undup(dup_list):
    b={}
    for i in dup_list:
        b.update({i:1})
    return b.keys()
a=["a",'b','a']
print undup(a)

disclamer: u may get an indentation error(if copy and paste) ,use the above code with proper indentation before pasting

answered Sep 13 '18 at 07:44

yunus

2,445
1
14
12

You could just do `list(dict.fromkeys(dup_list))` – Brayoni May 01 '20 at 08:52

score 0 · Answer 50 · answered Oct 19 '18 at 08:09

0

Python has built-in many functions You can use set() to remove the duplicate inside the list. As per your example there are below two lists t and t2

t = ['a', 'b', 'c', 'd']
t2 = ['a', 'c', 'd']
result = list(set(t) - set(t2))
result

Answer: ['b']

answered Oct 19 '18 at 08:09

Anoop Kumar

845
1
8
19

where23 · Answer 51 · 2018-12-19T06:23:30.650

0

Sometimes you need to remove the duplicate items in-place, without creating new list. For example, the list is big, or keep it as a shadow copy

from collections import Counter
cntDict = Counter(t)
for item,cnt in cntDict.items():
    for _ in range(cnt-1):
        t.remove(item)

edited Dec 19 '18 at 06:23

answered Dec 19 '18 at 06:17

where23

483
3
9

score 0 · Answer 52 · edited May 01 '20 at 11:09

0

If your list is ordered, you can use the following approach to iterate over it skipping the repeated values. This is especially useful to handle big lists with low memory consumption evading the cost of building a dict or a set:

def uniq(iterator):
    prev = None
    for item in iterator:
        if item != prev:
            prev = item
            yield item

Then:

for item in uniq([1, 1, 3, 5, 5, 6]):
    print(item, end=' ')

The output is going to be: 1 3 5 6

To return a list object, you could do:

>>> print(list(uniq([1, 1, 3, 5, 5, 6])))
[1, 3, 5, 6]

edited May 01 '20 at 11:09

Brayoni

696
7
14

answered May 22 '19 at 15:57

Israel Teixeira

165
2
5

1

My answer details a more efficient way of skipping repeated values when the constraint of having a pre-ordered list is appreciated – Israel Teixeira May 23 '19 at 19:54
This only catches duplicates in an ordered list. – ingyhere Mar 29 '21 at 16:50

Maaz Bin Mustaqeem · Answer 53 · 2022-01-04T19:32:49.743

0

You can compare the length of the set and the list and save the set items to list.

if len(t) != len(set(t)):
    t = [x for x in set(t)]

edited Jan 04 '22 at 19:32

answered Jan 04 '22 at 18:46

Maaz Bin Mustaqeem

129
1
5

score 0 · Answer 54 · answered May 20 '23 at 21:24

IF ...

the order of deletion matters
you want to do this in-place

... then this function may be interesting for you. Note I've not optimized anything, it's not very Pythonic and all that, and it's better to handle such stuff during collection of the data, but still, imagine you have a bunch of collected objects and want to get rid of the earlier (fifo) or later (lifo) objects that are in fact duplicates ... -

def remove_list_duplicates( lst, fifo=False ):
  idx = 0 ; incr = 1 ; stopidx = len( lst )
  if not fifo:
    idx = stopidx - 1 ; stopidx = incr = -1
  while idx != stopidx:
    elem = lst[idx]
    if lst.count(elem) > 1:
      lst.pop( idx )
      if fifo:
        stopidx -= 1 ; idx -= 1
    idx += incr

Applying like this:

seq = [1,2,3,4,3,8,1,5,1]
print( 'inp', seq )
remove_list_duplicates( seq, True )
print( 'fifo', seq )
seq = [1,2,3,4,3,8,1,5,1]
remove_list_duplicates( seq, False )
print( 'lifo', seq )

Delivers:

inp [1, 2, 3, 4, 3, 8, 1, 5, 1]
fifo [2, 4, 3, 8, 5, 1]
lifo [1, 2, 3, 4, 8, 5]

score -1 · Answer 55 · answered Nov 25 '20 at 05:06

-1

Test = [1,8,2,7,3,4,5,1,2,3,6]
Test.sort()
i=1
while i< len(Test):
  if Test[i] == Test[i-1]:
    Test.remove(Test[i])
  i= i+1
print(Test)

answered Nov 25 '20 at 05:06

Angela C

9
1

score -1 · Answer 56 · answered Feb 09 '21 at 09:53

-1

Check for the string 'a' and 'b'

clean_list = []
    for ele in raw_list:
        if 'b' in ele or 'a' in ele:
            pass
        else:
            clean_list.append(ele)

answered Feb 09 '21 at 09:53

kamran kausar

4,117
1
23
17

score -2 · Answer 57 · answered Apr 14 '21 at 06:37

Write a Python program to create a list of numbers by taking input from the user and then remove  the duplicates from the list. You can take input of non-zero numbers, with an appropriate  prompt, from the user until the user enters a zero to create the list assuming that the numbers  are non-zero.  
Sample Input: [10, 34, 18, 10, 12, 34, 18, 20, 25, 20]  
Output: [10, 34, 18, 12, 20, 25] 

 lst = []
print("ENTER ZERO NUMBER FOR EXIT !!!!!!!!!!!!")
print("ENTER LIST ELEMENTS  :: ")
while True:
    n = int(input())
    if n == 0 :
       print("!!!!!!!!!!! EXIT !!!!!!!!!!!!")
       break
    else :
        lst.append(n)
print("LIST ELEMENR ARE :: ",lst)
#dup = set()
uniq = []
for x in lst:
    if x not in uniq:
        uniq.append(x)
       # dup.add(x)
print("UNIQUE ELEMENTS IN LIST ARE :: ",uniq)

What's the point of the double loop? You can already do the `if x not in` check when taking the input. i.e. `if n not in lst: lst.append(n)` inside the `while True` loop and then `lst` will already have the result. No need for `uniq` at all... — Tomerikoo, Apr 17 '21 at 20:44

Removing duplicates in lists

57 Answers57

Maintaining order

Pandas solution

Numpy solution

Removing Duplicate Items

Which Function is the Fastest?

Unordered Hashables

Ordered Hashables

Ordered Unhashables

The Magic of Python Built-in type

Method 1: The naive approach

Method 2: Using set()

Check for the string 'a' and 'b'

Linked

Related