One-liner to remove duplicates, keep ordering of list

Question

I have the following list:

['Herb', 'Alec', 'Herb', 'Don']

I want to remove duplicates while keeping the order, so it would be :

['Herb', 'Alec', 'Don']

Here is how I would do this verbosely:

l_new = []
for item in l_old:
    if item not in l_new: l_new.append(item)

Is there a way to do this in a single line?

@Dekel I understand, my question is looking for a one-liner though to do that. — David542, Aug 17 '17 at 23:27
many of the answers from that question have one liners using different approaches — Erich, Aug 17 '17 at 23:58

score 12 · Answer 1 · answered Aug 17 '17 at 23:39

12

You could use a set to remove duplicates and then restore ordering. And it's just as slow as your original, yaeh :-)

>>> sorted(set(l_old), key=l_old.index)
['Herb', 'Alec', 'Don']

answered Aug 17 '17 at 23:39

Stefan Pochmann

27,593
8
44
107

Hah! I find that solutions hilarious! It's inspired me as well... – juanpa.arrivillaga Aug 17 '17 at 23:45

juanpa.arrivillaga · Accepted Answer · 2017-08-17T23:58:26.010

7

You could use an OrderedDict, but I suggest sticking with your for-loop.

>>> from collections import OrderedDict
>>> data = ['Herb', 'Alec', 'Herb', 'Don']
>>> list(OrderedDict.fromkeys(data))
['Herb', 'Alec', 'Don']

Just to reiterate: I seriously suggest sticking with your for-loop approach, and use a set to keep track of already seen items:

>>> data = ['Herb', 'Alec', 'Herb', 'Don']
>>> seen = set()
>>> unique_data = []
>>> for x in data:
...     if x not in seen:
...         unique_data.append(x)
...         seen.add(x)
...
>>> unique_data
['Herb', 'Alec', 'Don']

And in case you just want to be wacky (seriously don't do this):

>>> [t[0] for t in sorted(dict(zip(reversed(data), range(len(data), -1, -1))).items(), key=lambda t:t[1])]
['Herb', 'Alec', 'Don']

edited Aug 17 '17 at 23:58

answered Aug 17 '17 at 23:30

juanpa.arrivillaga

88,713
10
131
172

Why would you suggest against the above though? – David542 Aug 17 '17 at 23:31
@David542 because it is inefficient and not explicit. Indeed, almost any one-liner will be, I suspect. – juanpa.arrivillaga Aug 17 '17 at 23:32
@StefanPochmann I've edited to explicitly include what I *meant* to imply. – juanpa.arrivillaga Aug 17 '17 at 23:40
[`OrderedDict.fromkeys` is a class method](https://docs.python.org/3/library/stdtypes.html#dict.fromkeys), no? So there's no need to create an `OrderedDict` instance. `list(OrderedDict.fromkeys(data))` would work. – Christian Dean Aug 17 '17 at 23:55
@ChristianDean yep, silly mistake on my part. Thanks for pointing it out. I think I originally started writing something like `OrderedDict((k, None) for k in data)` and then was like, oh wait, `.fromkeys`already exists... – juanpa.arrivillaga Aug 17 '17 at 23:58
@juanpa.arrivillaga No worries. I think we've all made that mistake trying to post our answer as fast as possible. Classic FGITW side-effects. Oh, and by the way, that last method burned my eyes ;-) – Christian Dean Aug 18 '17 at 00:04

score 7 · Answer 3 · edited May 23 '22 at 14:15

7

Using pandas, create a series from the list, drop duplicates, and then convert it back to a list.

import pandas as pd

>>> pd.Series(['Herb', 'Alec', 'Herb', 'Don']).drop_duplicates().tolist()
['Herb', 'Alec', 'Don']

Timings

Solution from @StefanPochmann is the clear winner for lists with high duplication.

my_list = ['Herb', 'Alec', 'Don'] * 10000

%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.11 ms per loop

%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 16.1 ms per loop

%timeit sorted(set(my_list), key=my_list.index)
# 1000 loops, best of 3: 396 µs per loop

For larger lists with no duplication (e.g. simply a range of numbers), the pandas solution is very fast.

my_list = range(10000)

%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.16 ms per loop

%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 10.8 ms per loop

%timeit sorted(set(my_list), key=my_list.index)
# 1 loop, best of 3: 716 ms per loop

edited May 23 '22 at 14:15

Olivier

303
3
14

answered Aug 17 '17 at 23:37

Alexander

105,104
32
201
196

2

How fitting for you to use _pandas_ ;-) – Christian Dean Aug 17 '17 at 23:47
@ChristianDean A pandas developer using pandas... shocking. – Stefan Pochmann Aug 17 '17 at 23:55
@StefanPochmann You realize I'm talking about his profile picture, right? – Christian Dean Aug 17 '17 at 23:56
1

@ChristianDean I do, but I think you got it backwards. I think it's the pic that's fitting his use of pandas, not the other way around. – Stefan Pochmann Aug 17 '17 at 23:59
1

@Alexander While I do like winning things, I do need to point out that your test case is extreeeemely unfair towards the other solutions (because it's unreasonably good for mine). – Stefan Pochmann Aug 18 '17 at 00:01
I know..... (-; Just didn't want to write too many cases (small lists, larget lists with little duplicates, large list with lots of duplicates...). Your solution is still very good. – Alexander Aug 18 '17 at 00:02

score 3 · Answer 4 · answered Aug 17 '17 at 23:29

If you really don't care about optimizations and stuff you can use the following:

s = ['Herb', 'Alec', 'Herb', 'Don']
[x[0] for x in zip(s, range(len(s))) if x[0] not in s[:x[1]]]

Note that in my opinion you really should use the for loop in your question or the answer by @juanpa.arrivillaga

Erich · Answer 5 · 2017-08-17T23:44:28.337

1

l_new = []
for item in l_old:
    if item not in l_new: l_new.append(item)

In one line..ish:

l_new = []

[ l_new.append(item)  for item in l_old if item not in l_new]

Which has the behavior:

> a = [1,1,2,2,3,3,4,5,5]
> b = []
> [ b.append(item) for item in a if item not in b]
> print(b)
[1,2,3,4,5]

edited Aug 17 '17 at 23:44

answered Aug 17 '17 at 23:34

Erich

1,902
1
17
23

Your one-line solution needs a semicolon: `l_new = []; [l_new.append(item) for item in l_old if item not in l_new]` – Kae Aug 17 '17 at 23:43
1

But that would be cheating :P – Erich Aug 17 '17 at 23:44
Then do it *inside* the comprehension. – Stefan Pochmann Aug 17 '17 at 23:45
@StefanPochmann uhhhh.... how? Sorry if naive question, I don't know how to declare something inside a list comprehension – Erich Aug 17 '17 at 23:48
2

@Erich Huh? You're already doing that. With `item`. Ok, here's a way: `[l_new.append(item) or l_new for l_new in [[]] for item in l_old if item not in l_new][0]` – Stefan Pochmann Aug 17 '17 at 23:51
1

Ahhh, I see. I thought that I would have to create something which existed outside of the scope of the comprehension but your empty list trick is very cool :) – Erich Aug 17 '17 at 23:55
Or in Python 2 you could do `[0 for tmp in [[]]] and [tmp.append(item) for item in l_old if item not in tmp] and tmp`. But don't tell anyone that I said that. – Stefan Pochmann Aug 18 '17 at 00:08

score 1 · Answer 6 · answered Aug 17 '17 at 23:37

You can try this:

l = ['Herb', 'Alec', 'Herb', 'Don']
data = [i[-1] for i in sorted([({a:i for i, a in enumerate(l)}[a], a) for a in set({a:i for i, a in enumerate(l)}.keys())], key = lambda x: x[0])]

Output:

['Alec', 'Herb', 'Don']

This algorithm merely removes the first instance of a duplicate value.

One-liner to remove duplicates, keep ordering of list

6 Answers6

Related