-2

I'm trying to write a function to remov duplicates in a list without sorting, but it take long time in big lists, Is there a faster way than this?

from time import time

def delrepeat(rlist):
    ti = time()
    repeated = []
    for i in list(set(rlist)):
        rc = rlist.count(i)
        if rc > 1:
            repeated.append((i, rc))
    newlist = list(reversed(rlist))
    for repeat in repeated:
        i, rc = repeat
        while rc > 1:
            newlist.pop(newlist.index(i))
            rc -= 1
    print(time()-ti)
    return list(reversed(newlist))

delrepeat([3,2,1,3,5,3]*10000)
    
# --------------------------
# 5.181169271469116
# [3, 2, 1, 5]
Reblochon Masque
  • 35,405
  • 10
  • 55
  • 80
Robox404
  • 7
  • 5

3 Answers3

2

Dictionaries are insertion ordered as of Python 3.7, so you can very easily and efficiently remove duplicates from a list without altering the list's order by just converting it to a dict and back

list_with_duplicates = [3, 2, 1, 3, 5, 3]
dict_from_list = {i : 0 for i in list_with_duplicates}
list_without_duplicates = list(dict_from_list.keys())

Or as a one-liner:

list_without_duplicates = list(dict(zip(list_with_duplicates, list_with_duplicates)))
Michael
  • 2,344
  • 6
  • 12
1

If you want to keep the order of the list while eliminating the repeats, you can use a set that keeps track of the elements already seen:

from time import time

def delrepeat(rlist):
    ti = time()
    seen = set()
    res = []
    for elt in rlist:
        if elt in seen:    # check if we've seen that one, and skip if we did
            continue
        seen.add(elt)      # if not, mark it as seen, and add it to the result
        res.append(elt)
    print(time()-ti)
    return res

delrepeat([3,2,1,3,5,3]*10000)

which gives these results on my system:

0.0037832260131835938
[3, 2, 1, 5]

compared to the time needed with your code:

15.060174226760864
[3, 2, 1, 5]
Reblochon Masque
  • 35,405
  • 10
  • 55
  • 80
1

Your function looks like it's around quadratic time. It can be done in average linear time:

def delrepeat(l):
    seen = set()
    out = list()
    for elem in l:
        if elem not in seen:
            out.append(elem)
            seen.add(elem)
    return out

You keep track of elements already seen in a hash table which can do lookups and additions in O(1) time.

Anonymous1847
  • 2,568
  • 10
  • 16