0

I have a function, unique(a) that takes a list, a, of numbers and returns only one of each value. At the same time, it maintains the order of the list. I also have a function, big_list(n) that generates a list of len(n).

The reason to why I reverse the direction of the list is so that when removing values, It removes them from the back of the original list, just to make the modified list more clean and readable when comparing it to the original list.

The function works when I have a relatively small length of the list I'm creating, but when I get to larger lengths, such as 1,000,000 for ex, the execution time takes FOREVER.

If anyone can help me by making my function a lot faster, that would be great!

FYI : I need to use a set somewhere in the function for the assignment I am working on. I still need to remove list items from the back as well.

Thanks in advance!

def big_list(n) :
    # Create a list of n 'random' values in the range [-n/2,n/2]
    return [ randrange(-n//2, n//2) for i in range(n) ]

def unique(a) :
    a = a[::-1]
    b = set(a)
    for i in b :
        while a.count(i) != 1 :
            a.remove(i)
            a.count(i)
    a = a[::-1]
    return a
jgritty
  • 11,660
  • 3
  • 38
  • 60
Evan Cooper
  • 146
  • 1
  • 2
  • 10

3 Answers3

3

Your algorithm is doing a lot of extra work moving elements around. Consider:

def unique(a):
    b = set()
    r = []
    for x in a:
        if x not in b:
            r.append(x)
            b.insert(x)
    return r
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • Thank you! Is it usually better to create a new list when iterating over a list, like you just did in your example? – Evan Cooper Oct 06 '14 at 21:54
  • 2
    Python is pretty efficient at creating new lists that way. The alternative is to do a lot of `.remove()` operations which has to shift elements around a lot, especially if you have a many elements to remove one by one. – Greg Hewgill Oct 06 '14 at 22:04
1

Every time you call a.count(i) it loops over the entire list to count up the occurrences. This is an O(n) operation which you repeat over and over. When you factor in the O(n) runtime of the outer for i in b: loop the overall algorithmic complexity is O(n2).

It doesn't help that there's a second unnecessary a.count(i) inside the while loop. That call doesn't do anything but chew up time.

This entire problem can be done in O(n) time. Your best bet would be to avoid list.count() altogether and figure out how you can loop over the list and count elements yourself. If you're clever you can do everything in a single pass, no nested loops (or implicit nested loops) required.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
1

You can find a thorough benchmark of "unique" functions at this address. My personal favorite is

def unique(seq):
    # Order preserving
    seen = set()
    return [x for x in seq if x not in seen and not seen.add(x)]

because it's the fastest and it preserves order, while using sets smartly. I think it's f7, it's given in the comments.

Flavian Hautbois
  • 2,940
  • 6
  • 28
  • 45