removing duplicates in list without sorting

Question

I'm trying to write a function to remov duplicates in a list without sorting, but it take long time in big lists, Is there a faster way than this?

from time import time

def delrepeat(rlist):
    ti = time()
    repeated = []
    for i in list(set(rlist)):
        rc = rlist.count(i)
        if rc > 1:
            repeated.append((i, rc))
    newlist = list(reversed(rlist))
    for repeat in repeated:
        i, rc = repeat
        while rc > 1:
            newlist.pop(newlist.index(i))
            rc -= 1
    print(time()-ti)
    return list(reversed(newlist))

delrepeat([3,2,1,3,5,3]*10000)
    
# --------------------------
# 5.181169271469116
# [3, 2, 1, 5]

if you don't need ordering, `set()` will remove duplicates. So you could do `list(set())`. — at80, Nov 13 '20 at 02:34
If you need to maintain order, you can create a dictionary that holds elements that you have seen already. — Hunter McMillen, Nov 13 '20 at 02:35
@HunterMcMillen A dict isn't needed, a set should be enough. — Michael Butscher, Nov 13 '20 at 02:37
@MichaelButscher As mentioned already by at80, the solution with the set _won't_ preserve the initial order. Which is why I prefaced my statement with _"If you need to maintain order"_ — Hunter McMillen, Nov 13 '20 at 02:38
@HunterMcMillen I think they mean you only need a set to store the seen elements. — Anonymous1847, Nov 13 '20 at 02:39
@Anonymous1847 If they meant that then they wouldn't have said _"If you don't need ordering"_. — Hunter McMillen, Nov 13 '20 at 02:40
@HunterMcMillen Got it. You meant a 3.7+ insertion ordered dict instead of a list. I meant to use a set and a list. — Michael Butscher, Nov 13 '20 at 02:45
Please repeat [on topic](https://stackoverflow.com/help/on-topic) and [how to ask](https://stackoverflow.com/help/how-to-ask) from the [intro tour](https://stackoverflow.com/tour). We expect you to research your question before posting here. — Prune, Nov 13 '20 at 02:55

score 2 · Answer 1 · answered Nov 13 '20 at 02:44

Dictionaries are insertion ordered as of Python 3.7, so you can very easily and efficiently remove duplicates from a list without altering the list's order by just converting it to a dict and back

list_with_duplicates = [3, 2, 1, 3, 5, 3]
dict_from_list = {i : 0 for i in list_with_duplicates}
list_without_duplicates = list(dict_from_list.keys())

Or as a one-liner:

list_without_duplicates = list(dict(zip(list_with_duplicates, list_with_duplicates)))

score 1 · Answer 2 · answered Nov 13 '20 at 02:36

If you want to keep the order of the list while eliminating the repeats, you can use a set that keeps track of the elements already seen:

from time import time

def delrepeat(rlist):
    ti = time()
    seen = set()
    res = []
    for elt in rlist:
        if elt in seen:    # check if we've seen that one, and skip if we did
            continue
        seen.add(elt)      # if not, mark it as seen, and add it to the result
        res.append(elt)
    print(time()-ti)
    return res

delrepeat([3,2,1,3,5,3]*10000)

which gives these results on my system:

0.0037832260131835938
[3, 2, 1, 5]

compared to the time needed with your code:

15.060174226760864
[3, 2, 1, 5]

score 1 · Answer 3 · answered Nov 13 '20 at 02:38

1

Your function looks like it's around quadratic time. It can be done in average linear time:

def delrepeat(l):
    seen = set()
    out = list()
    for elem in l:
        if elem not in seen:
            out.append(elem)
            seen.add(elem)
    return out

You keep track of elements already seen in a hash table which can do lookups and additions in O(1) time.

answered Nov 13 '20 at 02:38

Anonymous1847

2,568
10
16

Your algorithm and the one you claim is quadratic are identical. It uses a "continue" to skip duplicates while you use an `if` statement. I see no other difference. – Frank Yellin Nov 13 '20 at 02:40
@FrankYellin I was talking about OP's code. `for i in list(set(rlist)): rc = rlist.count(i)` looks quadratic to me. – Anonymous1847 Nov 13 '20 at 04:10
Sorry. I misunderstood. – Frank Yellin Nov 13 '20 at 04:17

removing duplicates in list without sorting

3 Answers3