12

I have the following list:

['Herb', 'Alec', 'Herb', 'Don']

I want to remove duplicates while keeping the order, so it would be :

['Herb', 'Alec', 'Don']

Here is how I would do this verbosely:

l_new = []
for item in l_old:
    if item not in l_new: l_new.append(item)

Is there a way to do this in a single line?

DjaouadNM
  • 22,013
  • 4
  • 33
  • 55
David542
  • 104,438
  • 178
  • 489
  • 842

6 Answers6

12

You could use a set to remove duplicates and then restore ordering. And it's just as slow as your original, yaeh :-)

>>> sorted(set(l_old), key=l_old.index)
['Herb', 'Alec', 'Don']
Stefan Pochmann
  • 27,593
  • 8
  • 44
  • 107
7

You could use an OrderedDict, but I suggest sticking with your for-loop.

>>> from collections import OrderedDict
>>> data = ['Herb', 'Alec', 'Herb', 'Don']
>>> list(OrderedDict.fromkeys(data))
['Herb', 'Alec', 'Don']

Just to reiterate: I seriously suggest sticking with your for-loop approach, and use a set to keep track of already seen items:

>>> data = ['Herb', 'Alec', 'Herb', 'Don']
>>> seen = set()
>>> unique_data = []
>>> for x in data:
...     if x not in seen:
...         unique_data.append(x)
...         seen.add(x)
...
>>> unique_data
['Herb', 'Alec', 'Don']

And in case you just want to be wacky (seriously don't do this):

>>> [t[0] for t in sorted(dict(zip(reversed(data), range(len(data), -1, -1))).items(), key=lambda t:t[1])]
['Herb', 'Alec', 'Don']
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • Why would you suggest against the above though? – David542 Aug 17 '17 at 23:31
  • @David542 because it is inefficient and not explicit. Indeed, almost any one-liner will be, I suspect. – juanpa.arrivillaga Aug 17 '17 at 23:32
  • @StefanPochmann I've edited to explicitly include what I *meant* to imply. – juanpa.arrivillaga Aug 17 '17 at 23:40
  • [`OrderedDict.fromkeys` is a class method](https://docs.python.org/3/library/stdtypes.html#dict.fromkeys), no? So there's no need to create an `OrderedDict` instance. `list(OrderedDict.fromkeys(data))` would work. – Christian Dean Aug 17 '17 at 23:55
  • @ChristianDean yep, silly mistake on my part. Thanks for pointing it out. I think I originally started writing something like `OrderedDict((k, None) for k in data)` and then was like, oh wait, `.fromkeys`already exists... – juanpa.arrivillaga Aug 17 '17 at 23:58
  • @juanpa.arrivillaga No worries. I think we've all made that mistake trying to post our answer as fast as possible. Classic FGITW side-effects. Oh, and by the way, that last method burned my eyes ;-) – Christian Dean Aug 18 '17 at 00:04
7

Using pandas, create a series from the list, drop duplicates, and then convert it back to a list.

import pandas as pd

>>> pd.Series(['Herb', 'Alec', 'Herb', 'Don']).drop_duplicates().tolist()
['Herb', 'Alec', 'Don']

Timings

Solution from @StefanPochmann is the clear winner for lists with high duplication.

my_list = ['Herb', 'Alec', 'Don'] * 10000

%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.11 ms per loop

%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 16.1 ms per loop

%timeit sorted(set(my_list), key=my_list.index)
# 1000 loops, best of 3: 396 µs per loop

For larger lists with no duplication (e.g. simply a range of numbers), the pandas solution is very fast.

my_list = range(10000)

%timeit pd.Series(my_list).drop_duplicates().tolist()
# 100 loops, best of 3: 3.16 ms per loop

%timeit list(OrderedDict().fromkeys(my_list))
# 100 loops, best of 3: 10.8 ms per loop

%timeit sorted(set(my_list), key=my_list.index)
# 1 loop, best of 3: 716 ms per loop
Olivier
  • 303
  • 3
  • 14
Alexander
  • 105,104
  • 32
  • 201
  • 196
3

If you really don't care about optimizations and stuff you can use the following:

s = ['Herb', 'Alec', 'Herb', 'Don']
[x[0] for x in zip(s, range(len(s))) if x[0] not in s[:x[1]]]

Note that in my opinion you really should use the for loop in your question or the answer by @juanpa.arrivillaga

Dekel
  • 60,707
  • 10
  • 101
  • 129
1
l_new = []
for item in l_old:
    if item not in l_new: l_new.append(item)

In one line..ish:

l_new = []

[ l_new.append(item)  for item in l_old if item not in l_new]

Which has the behavior:

> a = [1,1,2,2,3,3,4,5,5]
> b = []
> [ b.append(item) for item in a if item not in b]
> print(b)
[1,2,3,4,5]
Erich
  • 1,902
  • 1
  • 17
  • 23
  • Your one-line solution needs a semicolon: `l_new = []; [l_new.append(item) for item in l_old if item not in l_new]` – Kae Aug 17 '17 at 23:43
  • 1
    But that would be cheating :P – Erich Aug 17 '17 at 23:44
  • Then do it *inside* the comprehension. – Stefan Pochmann Aug 17 '17 at 23:45
  • @StefanPochmann uhhhh.... how? Sorry if naive question, I don't know how to declare something inside a list comprehension – Erich Aug 17 '17 at 23:48
  • 2
    @Erich Huh? You're already doing that. With `item`. Ok, here's a way: `[l_new.append(item) or l_new for l_new in [[]] for item in l_old if item not in l_new][0]` – Stefan Pochmann Aug 17 '17 at 23:51
  • 1
    Ahhh, I see. I thought that I would have to create something which existed outside of the scope of the comprehension but your empty list trick is very cool :) – Erich Aug 17 '17 at 23:55
  • Or in Python 2 you could do `[0 for tmp in [[]]] and [tmp.append(item) for item in l_old if item not in tmp] and tmp`. But don't tell anyone that I said that. – Stefan Pochmann Aug 18 '17 at 00:08
1

You can try this:

l = ['Herb', 'Alec', 'Herb', 'Don']
data = [i[-1] for i in sorted([({a:i for i, a in enumerate(l)}[a], a) for a in set({a:i for i, a in enumerate(l)}.keys())], key = lambda x: x[0])]

Output:

['Alec', 'Herb', 'Don']

This algorithm merely removes the first instance of a duplicate value.

Ajax1234
  • 69,937
  • 8
  • 61
  • 102