Python - large list efficiency

Question

I have a function, unique(a) that takes a list, a, of numbers and returns only one of each value. At the same time, it maintains the order of the list. I also have a function, big_list(n) that generates a list of len(n).

The reason to why I reverse the direction of the list is so that when removing values, It removes them from the back of the original list, just to make the modified list more clean and readable when comparing it to the original list.

The function works when I have a relatively small length of the list I'm creating, but when I get to larger lengths, such as 1,000,000 for ex, the execution time takes FOREVER.

If anyone can help me by making my function a lot faster, that would be great!

FYI : I need to use a set somewhere in the function for the assignment I am working on. I still need to remove list items from the back as well.

Thanks in advance!

def big_list(n) :
    # Create a list of n 'random' values in the range [-n/2,n/2]
    return [ randrange(-n//2, n//2) for i in range(n) ]

def unique(a) :
    a = a[::-1]
    b = set(a)
    for i in b :
        while a.count(i) != 1 :
            a.remove(i)
            a.count(i)
    a = a[::-1]
    return a

Set is already unique. It will not contain duplicates. i.e. x=set(big_list(10k)), x will have no duplicates. — Clarus, Oct 06 '14 at 21:46
Isn't it what makes it a worst-case example for time complexity? — Flavian Hautbois, Oct 06 '14 at 21:47

score 3 · Accepted Answer · answered Oct 06 '14 at 21:45

3

Your algorithm is doing a lot of extra work moving elements around. Consider:

def unique(a):
    b = set()
    r = []
    for x in a:
        if x not in b:
            r.append(x)
            b.insert(x)
    return r

answered Oct 06 '14 at 21:45

Greg Hewgill

951,095
183
1,149
1,285

Thank you! Is it usually better to create a new list when iterating over a list, like you just did in your example? – Evan Cooper Oct 06 '14 at 21:54
2

Python is pretty efficient at creating new lists that way. The alternative is to do a lot of `.remove()` operations which has to shift elements around a lot, especially if you have a many elements to remove one by one. – Greg Hewgill Oct 06 '14 at 22:04

score 1 · Answer 2 · answered Oct 06 '14 at 21:45

Every time you call a.count(i) it loops over the entire list to count up the occurrences. This is an O(n) operation which you repeat over and over. When you factor in the O(n) runtime of the outer for i in b: loop the overall algorithmic complexity is O(n²).

It doesn't help that there's a second unnecessary a.count(i) inside the while loop. That call doesn't do anything but chew up time.

This entire problem can be done in O(n) time. Your best bet would be to avoid list.count() altogether and figure out how you can loop over the list and count elements yourself. If you're clever you can do everything in a single pass, no nested loops (or implicit nested loops) required.

score 1 · Answer 3 · answered Oct 06 '14 at 21:46

1

You can find a thorough benchmark of "unique" functions at this address. My personal favorite is

def unique(seq):
    # Order preserving
    seen = set()
    return [x for x in seq if x not in seen and not seen.add(x)]

because it's the fastest and it preserves order, while using sets smartly. I think it's f7, it's given in the comments.

answered Oct 06 '14 at 21:46

Flavian Hautbois

2,940
6
28
45

I had seen that page before, except I did not realize that there was an f7. Thanks! – Evan Cooper Oct 06 '14 at 21:57

Python - large list efficiency

3 Answers3