2

I've got a list of object in python which i want to sort based on a attribute.

For eg:

abc is a class with attributes id and count.

I have a list objects of the class abc.

list=[abc('1',120),abc('2',0),abc('0',180),abc('5',150)].

I want to sort the list in ascending order of the attribute 'count'

I have done it using:

list.sort(key=attrgetter('count'))

I have found using profiling my python script that it takes lot of time for sorting.

Can anyone suggest a better and faster way to sort list of object based on a attribute minimizing the time for sorting.

titto.sebastian
  • 511
  • 2
  • 7
  • 17

3 Answers3

2

A nitpick: you're using the name list for your list which will overwrite the standard list class. You'd be better off using l as the list name.

I tested sorting a list containing 12 times the contents of your list 100000 times. It took 0.848 s without a comparator function or a key when I used the sorted() function to avoid re-sorting an already sorted list.

There are at least three ways I can think of:

A. Use the sort() with the comparator function:

def comparator(x, y):
  return cmp(x.count, y.count)
l.sort(cmp=comparator)

This took 9.598 s on my system when I used the sorted() function to avoid re-sorting an already sorted list.

B. Use the sort() with the key function:

l.sort(key=operator.attrgetter('count'))

This took 3.111 s on my system when I used the sorted() function to avoid re-sorting an already sorted list.

C. Use native C code to improve the performance of the sorting. I didn't test this.

So, you seem to be already using the fastest all-Python way there is and the way forward would be the use of native C code.

juhist
  • 4,210
  • 16
  • 33
  • Since `list.sort` has the side effect of sorting the list, you will be mostly sorting sorted lists -- meaning the time spent on the comparator sort will be unrepresentative of the average cost of sorting with a comparator. Try testing using `sorted(l, cmp=comparator)` instead. – Dunes Mar 23 '15 at 09:05
  • Yes, you're right that I re-sorted an already sorted list. I tried it again by using the sorted() function, and the results are very interesting now: the cmp variant is very, very slow. – juhist Mar 23 '15 at 09:24
0

I believe the sort method is implemented using Timsort algorithm, so there is not much you can improve in terms of sorting.

What you can do is to insert the elements differently provided you have the control over the inserting part of code.
For example you could use a binary heap to optimize retreiving of the largest element (see heapq module in Python) or binary search tree to maintain the sorting order. The data strcuture you choose, mainly depends on what do you want to do with the elements.

matino
  • 17,199
  • 8
  • 49
  • 58
  • I tried to implement heapq module in python,but it did'nt helped since list of objects could'nt be heapified directly based on an attribute.I have to sort the list according to the attribute,then heapify it which does'nt solve my problem instead making it complex. – titto.sebastian Mar 23 '15 at 10:53
  • You can implement your own solution on top on `heapq` e.g. http://stackoverflow.com/questions/8875706/python-heapq-with-custom-compare-predicate – matino Mar 23 '15 at 12:21
  • I used binary search insertion while inserting so that I can eliminate the overhead of sorting.And it worked.Thank You for your suggestion – titto.sebastian Mar 24 '15 at 05:36
0

If I understand you correctly:

class Abc(object):
    def __init__(self, name, count):
        self.name = name
        self.count = count

    @classmethod
    def sort_key(cls, key):
        if key == 'count':
            return lambda obj: obj.count
        elif key == 'name':
            return lambda obj: obj.name


lst = [Abc('1', 120), Abc('2', 0), Abc('0', 180), Abc('5', 150)]

lst.sort(key=Abc.sort_key('count'))
for e in lst:
    print e.name, e.count
print

lst.sort(key=Abc.sort_key('name'))
for e in lst:
    print e.name, e.count
print

I'd not recommend you use 'id', 'abc' and 'list' as names of arbitrary variables because they are keywords in python.

Fomalhaut
  • 8,590
  • 8
  • 51
  • 95
  • I named the class name as 'abc' for the purpose of showing an example of the problem i am facing.The requirement is to reduce the time taken for sorting a list of object based on a attribute. – titto.sebastian Mar 23 '15 at 10:10