12

I have a bunch of sorted lists of objects, and a comparison function

class Obj :
    def __init__(p) :
        self.points = p
def cmp(a, b) :
    return a.points < b.points

a = [Obj(1), Obj(3), Obj(8), ...]
b = [Obj(1), Obj(2), Obj(3), ...]
c = [Obj(100), Obj(300), Obj(800), ...]

result = magic(a, b, c)
assert result == [Obj(1), Obj(1), Obj(2), Obj(3), Obj(3), Obj(8), ...]

what does magic look like? My current implementation is

def magic(*args) :
    r = []
    for a in args : r += a
    return sorted(r, cmp)

but that is quite inefficient. Better answers?

Paul Tarjan
  • 48,968
  • 59
  • 172
  • 213

9 Answers9

13

Python standard library offers a method for it: heapq.merge.
As the documentation says, it is very similar to using itertools (but with more limitations); if you cannot live with those limitations (or if you do not use Python 2.6) you can do something like this:

sorted(itertools.chain(args), cmp)

However, I think it has the same complexity as your own solution, although using iterators should give some quite good optimization and speed increase.

rob
  • 36,896
  • 2
  • 55
  • 65
  • 1
    Using key instead of cmp should be prefered (and shoudl be faster). Python3 does not have cmp parameter anyway. – Jiri Jul 21 '09 at 12:27
  • 2
    Actually, I was just using the same format as OP, but you are absolutely right and *key* should be preferred over *cmp*. – rob Jul 21 '09 at 12:55
  • Well, and the OP's cmp function is wrong and doesn't work. If you're using heapq, you'll have to provide __lt__ etc. methods on your class or use a tuple of (sorting key, object) in your heap instead. – habnabit Jul 21 '09 at 13:47
  • 1
    I think you mean itertools.chain(*args). What you wrote is equivalent to sorted(iter(args), cmp), which makes little sense. – Marius Gedminas Jul 21 '09 at 23:24
  • If I understand correctly that, sorting concatenated lists is `Θ(n.log n)` of complexity (proposed solution by code example), but merging (of sorted) lists is `Θ(n)`. Not so small difference. – Velda Apr 29 '18 at 17:40
  • the code example should be with `heapq.merge`. Using sorted and itertools together does not make much sense. Just write a key function and use either `sorted(a+b+c, key=lambda x: x.points)` for a list or `heapq.merge(a,b,c, key=lambda x: x.points)` for an iterator. – Andrea Ratto Apr 28 '20 at 13:29
3

I like Roberto Liffredo's answer. I didn't know about heapq.merge(). Hmmmph.

Here's what the complete solution looks like using Roberto's lead:

class Obj(object):
    def __init__(self, p) :
        self.points = p
    def __cmp__(self, b) :
        return cmp(self.points, b.points)
    def __str__(self):
        return "%d" % self.points

a = [Obj(1), Obj(3), Obj(8)]
b = [Obj(1), Obj(2), Obj(3)]
c = [Obj(100), Obj(300), Obj(800)]

import heapq

sorted = [item for item in heapq.merge(a,b,c)]
for item in sorted:
    print item

Or:

for item in heapq.merge(a,b,c):
    print item
hughdbrown
  • 47,733
  • 20
  • 85
  • 108
2

Use the bisect module. From the documentation: "This module provides support for maintaining a list in sorted order without having to sort the list after each insertion."

import bisect

def magic(*args):
    r = []
    for a in args:
        for i in a:
            bisect.insort(r, i)
    return r
codeape
  • 97,830
  • 24
  • 159
  • 188
2

Instead of using a list, you can use a [heap](http://en.wikipedia.org/wiki/Heap_(data_structure).

The insertion is O(log(n)), so merging a, b and c will be O(n log(n))

In Python, you can use the heapq module.

ThibThib
  • 8,010
  • 3
  • 30
  • 37
  • +1: Sorting a list in inherently inefficient: prevent the sort by using a smarter structure. – S.Lott Jul 21 '09 at 10:09
  • @OrganicPanda: Did you read the answer? It says that `heapq` amortizes the sorting cost. That's a smarter structure. Consider this, too. Accumulating three separate collections seems silly. Why not accumulate one hash of mutable objects; this can get updated by objects from the other sources. Now the "comparison" is moot because the objects have all be properly associated with each other without any sorting. – S.Lott Oct 05 '11 at 17:18
  • @S.Lott My apologies - I thought you were suggesting a smarter answer of your own but not explaining it .. my bad – OrganicPanda Oct 10 '11 at 09:56
0

I don't know whether it would be any quicker, but you could simplify it with:

def GetObjKey(a):
    return a.points

return sorted(a + b + c, key=GetObjKey)

You could also, of course, use cmp rather than key if you prefer.

DrAl
  • 70,428
  • 10
  • 106
  • 108
0

One line solution using sorted:

def magic(*args):
  return sorted(sum(args,[]), key: lambda x: x.points)

IMO this solution is very readable.

Using heapq module, it could be more efficient, but I have not tested it. You cannot specify cmp/key function in heapq, so you have to implement Obj to be implicitly sorted.

import heapq
def magic(*args):
  h = []
  for a in args:
    heapq.heappush(h,a)
  return [i for i in heapq.heappop(h)
Jiri
  • 16,425
  • 6
  • 52
  • 68
  • Your heapq method is a mess. You're pushing entire lists instead of their items, and you're ignoring the key. The one liner is cool, though. – itsadok Jul 21 '09 at 11:12
  • Yeah you are right, I have used heapq just few times and I did not paste that to console to test it. My fault, sorry. Although now I see that Obj object must be defined "sortable" for heapq to work, because you cannot specify cmp/key function in heapq. – Jiri Jul 21 '09 at 12:22
  • This code is all around a mess. Both snippets have syntax errors, and using sum for concatenating lists is very inefficient. Not to mention that there's operator.attrgetter to replace the lambda. – habnabit Jul 21 '09 at 13:44
0

Here you go: a fully functioning merge sort for lists (adapted from my sort here):

def merge(*args):
    import copy
    def merge_lists(left, right):
        result = []
        while left and right:
            which_list = (left if left[0] <= right[0] else right)
            result.append(which_list.pop(0))
        return result + left + right
    lists = list(args)
    while len(lists) > 1:
        left, right = copy.copy(lists.pop(0)), copy.copy(lists.pop(0))
        result = merge_lists(left, right)
        lists.append(result)
    return lists.pop(0)

Call it like this:

merged_list = merge(a, b, c)
for item in merged_list:
    print item

For good measure, I'll throw in a couple of changes to your Obj class:

class Obj(object):
    def __init__(self, p) :
        self.points = p
    def __cmp__(self, b) :
        return cmp(self.points, b.points)
    def __str__(self):
        return "%d" % self.points
  • Derive from object
  • Pass self to __init__()
  • Make __cmp__ a member function
  • Add a str() member function to present Obj as string
hughdbrown
  • 47,733
  • 20
  • 85
  • 108
0

I asked a similar question and got some excellent answers:

The best solutions from that question are variants of the merge algorithm, which you can read about here:

Community
  • 1
  • 1
dmw
  • 1,528
  • 16
  • 20
0

Below is an example of a function that runs in O(n) comparisons.

You could make this faster by making a and b iterators and incrementing them.

I have simply called the function twice to merge 3 lists:

def zip_sorted(a, b):
    '''
    zips two iterables, assuming they are already sorted
    '''
    i = 0
    j = 0
    result = []
    while i < len(a) and j < len(b):
        if a[i] < b[j]:
            result.append(a[i])
            i += 1
        else:
            result.append(b[j])
            j += 1
    if i < len(a):
        result.extend(a[i:])
    else:
        result.extend(b[j:])
    return result

def genSortedList(num,seed):
    result = [] 
    for i in range(num):
        result.append(i*seed)
    return result

if __name__ == '__main__':
    a = genSortedList(10000,2.0)
    b = genSortedList(6666,3.0)
    c = genSortedList(5000,4.0)
    d = zip_sorted(zip_sorted(a,b),c)
    print d

However, heapq.merge uses a mix of this method and heaping the current elements of all lists, so should perform much better

aong152
  • 143
  • 2
  • 6