My implementation of merging two sorted lists in linear time - what could be improved?

Question

Fromg Google's Python Class:

E. Given two lists sorted in increasing order, create and return a merged
list of all the elements in sorted order. You may modify the passed in lists.
Ideally, the solution should work in "linear" time, making a single
pass of both lists.

Here's my solution:

def linear_merge(list1, list2):
  merged_list = []
  i = 0
  j = 0

  while True:
    if i == len(list1):
        return merged_list + list2[j:]
    if j == len(list2):
        return merged_list + list1[i:]

    if list1[i] <= list2[j]:
        merged_list.append(list1[i])
        i += 1
    else:
        merged_list.append(list2[j])
        j += 1

First of all, is it okay to use an infinite loop here? Should I break out of the loop using the break keyword when I'm done merging the list, or are the returns okay here?

I've seen similar questions asked here, and all the solutions look quite similar to mine, i.e. very C-like. Is there no more python-like solution? Or is this because of the nature of the algorithm?

I appreciate that this is a "problem" and that you are supposed to solve it in a certain way. In all truth however, if this were a "real" task i would just use extend() and sort(). It is true that it would be O(NlgN) rather than O(N), but the speed advantage you would get by not doing per-item manipulations in python is quite stunning. — drxzcl, Nov 13 '10 at 15:30
Python's sorting is also very well optimized for a lot of common cases that naive sorts don't do well at: you may get substantially better than that. (I don't know if it's specifically optimized for "sorting two sorted, appended lists".) — Glenn Maynard, Nov 13 '10 at 15:55
By the way, I'm not going to spend time with detailed answers to your questions any more when you're marking non-answers like that as a solution. It's not even an answer (links to other answers with no original content belong in a comment)--and nothing in the solution he linked actually show how to do the above algorithm in a clean, generalized way in Python. — Glenn Maynard, Nov 13 '10 at 16:17
+1 for the question -10 for the selected answer somehow works out to -1. Go figure SO math. — aaronasterling, Nov 13 '10 at 23:11
@Glenn Maynard If I had the rep at the time, I would have posted that as a comment. Also, read his question, he isn't asking for someone to do it for him, he's asking general questions about it. So I would stand by what I said, namely that [the selected answer] (http://stackoverflow.com/questions/464342/combining-two-sorted-lists-in-python/464350#464350) to the post I linked to answers his question. It doesn't do so in code, but it does answer it. — aptwebapps, Nov 14 '10 at 05:10
Also, there was a nice implementation of that algorithm posted here about 12 hours ago, which I commented on, which I now can't find. I guess it got deleted. My comment doesn't even show up in my activity. — aptwebapps, Nov 14 '10 at 05:11

score 10 · Accepted Answer · edited May 23 '17 at 12:18

10

This question covers this in more detail than you probably need. ;) The chosen answer matches your requirement. If I needed to do this myself, I would do it in the way that dbr described in his or her answer (add the lists together, sort the new list) as it is very simple.

EDIT:

I'm adding an implementation below. I actually saw this in another answer here which seems to have been deleted. I'm just hoping it wasn't deleted because it had an error which I'm not catching. ;)

def mergeSortedLists(a, b):
    l = []
    while a and b:
        if a[0] < b[0]:
            l.append(a.pop(0))
        else:
            l.append(b.pop(0))
    return l + a + b

edited May 23 '17 at 12:18

Community

1
1

answered Nov 13 '10 at 15:42

aptwebapps

1,866
1
13
17

Exactly, don't sweat the little stuff unless you can prove it matters. – drxzcl Nov 13 '10 at 15:56
6

@Ranieri: When you're learning, the entire point is to sweat the small stuff. Combining and sorting will teach you a little about the standard library, but next to nothing about the language. If someone asks to learn to implement a linked list, telling them "don't worry about it, just use `std::list`" isn't doing them any favors. – Glenn Maynard Nov 13 '10 at 16:08
@Glenn Maynard I think Ranieri gets that. Look at his comment on the original question. Different people showing different treatments and discussing the differences is very helpful. The post I linked to is great and there are some interesting posts here as well. Although there was a really good one, I thought, that I don't see any more. Strange. – aptwebapps Nov 14 '10 at 05:00
According to a note, quote: # Note: the solution above is kind of cute, but unforunately list.pop(0) # is not constant time with the standard python list implementation, so # the above is not strictly linear time. # An alternate approach uses pop(-1) to remove the endmost elements # from each list, building a solution list which is backwards. # Then use reversed() to put the result back in the correct order. That # solution works in linear time, but is more ugly. – ervinbosenbacher Sep 12 '14 at 13:21

score 10 · Answer 2 · answered Nov 13 '10 at 15:54

Here's a generator approach. You've probably noticed that a whole lot of these "generate lists" can be done well as generator functions. They're very useful: they don't require you to generate the whole list before using data from it, to keep the whole list in memory, and you can use them to directly generate many data types, not just lists.

This works if passed any iterator, not just lists.

This approach also passes one of the more useful tests: it behaves well when passed an infinite or near-infinite iterator, eg. linear_merge(xrange(10**9), xrange(10**9)).

The redundancy in the two cases could probably be reduced, which would be useful if you wanted to support merging more than two lists, but for clarity I didn't do that here.

def linear_merge(list1, list2):
    """
    >>> a = [1, 3, 5, 7]
    >>> b = [2, 4, 6, 8]
    >>> [i for i in linear_merge(a, b)]
    [1, 2, 3, 4, 5, 6, 7, 8]
    >>> [i for i in linear_merge(b, a)]
    [1, 2, 3, 4, 5, 6, 7, 8]
    >>> a = [1, 2, 2, 3]
    >>> b = [2, 2, 4, 4]
    >>> [i for i in linear_merge(a, b)]
    [1, 2, 2, 2, 2, 3, 4, 4]
    """
    list1 = iter(list1)
    list2 = iter(list2)

    value1 = next(list1)
    value2 = next(list2)

    # We'll normally exit this loop from a next() call raising StopIteration, which is
    # how a generator function exits anyway.
    while True:
        if value1 <= value2:
            # Yield the lower value.
            yield value1
            try:
                # Grab the next value from list1.
                value1 = next(list1)
            except StopIteration:
                # list1 is empty.  Yield the last value we received from list2, then
                # yield the rest of list2.
                yield value2
                while True:
                    yield next(list2)
        else:
            yield value2
            try:
                value2 = next(list2)

            except StopIteration:
                # list2 is empty.
                yield value1
                while True:
                    yield next(list1)

+1. Much better than mine. Avoids indexing and the creation of a needless third list. — Steven Rumbalski, Nov 13 '10 at 15:57
+1 Algorithm works with any iterables, including mag tape drives, which is what I first used it on :-) — John Machin, Nov 13 '10 at 19:12
doesn't handel the case where list1=[] or list2=[], returns an empty list, but of course there is no need to do a sort in that case... — lgwest, May 11 '17 at 15:18

fivetentaylor · Answer 3 · 2014-01-06T17:53:50.550

Why stop at two lists?

Here's my generator based implementation to merge any number of sorted iterators in linear time.

I'm not sure why something like this isn't in itertools...

def merge(*sortedlists):

    # Create a list of tuples containing each iterator and its first value
    iterlist = [[i,i.next()] for i in [iter(j) for j in sortedlists]]

    # Perform an initial sort of each iterator's first value
    iterlist.sort(key=lambda x: x[1])

    # Helper function to move the larger first item to its proper position
    def reorder(iterlist, i): 
        if i == len(iterlist) or iterlist[0][1] < iterlist[i][1]:
            iterlist.insert(i-1,iterlist.pop(0))
        else:
            reorder(iterlist,i+1)

    while True:
        if len(iterlist):
            # Reorder the list if the 1st element has grown larger than the 2nd
            if len(iterlist) > 1 and iterlist[0][1] > iterlist[1][1]:
                reorder(iterlist, 1)

            yield iterlist[0][1]

            # try to pull the next value from the current iterator
            try:
                iterlist[0][1] = iterlist[0][0].next()
            except StopIteration:
                del iterlist[0]
        else:
            break

Here's an example:

x = [1,10,20,33,99]
y = [3,11,20,99,1001]
z = [3,5,7,70,1002]

[i for i in merge(x,y,z)]

score 2 · Answer 4 · answered Apr 15 '11 at 14:56

2

hi i just did this exercise and i was wondering why not use,

def linear_merge(list1, list2):
  return sorted(list1 + list2)

pythons sorted function is linear isn't it?

answered Apr 15 '11 at 14:56

vim

1,098
9
21

2

I just found out that pythons [sorting algo](http://en.wikipedia.org/wiki/Timsort) is actually nlog(n) . so this wouldn't work – vim Apr 15 '11 at 16:11

Steven Rumbalski · Answer 5 · 2011-03-05T18:14:30.877

1

I agree with other answers that extending and sorting is the most straightforward way, but if you must merge, this will be a little faster because it does not make two calls to len every iteration nor does it do a bounds check. The Python pattern, if you could call it that, is to avoid testing for a rare case and catch the exception instead.

def linear_merge(list1, list2):
    merged_list = []
    i = 0
    j = 0

    try:
        while True:
            if list1[i] <= list2[j]:
                merged_list.append(list1[i])
                i += 1
            else:
                merged_list.append(list2[j])
                j += 1
    except IndexError:
        if i == len(list1):
            merged_list.extend(list2[j:])
        if j == len(list2):
            merged_list.extend(list1[i:])
    return merged_list

edit Optimized per John Machin's comment. Moved try outside of while True and extended merged_list upon exception.

edited Mar 05 '11 at 18:14

answered Nov 13 '10 at 15:53

Steven Rumbalski

44,786
9
89
119

EAFP taken to the extreme -- I like it. – drxzcl Nov 13 '10 at 15:56
Even better would be (1) to put the `while True:` inside the `try:` (2) at the end, avoid copying: don't `return merged_list + twiddly_bits`, do `merged_list.extend(twiddly_bits); return merged_list` – John Machin Mar 05 '11 at 06:51
@John Machin Thank you. (2) was semi-obvious. I had to think about (1) and then realized I was setting up a `try` for every iteration. I won't miss that one again. Thanks again. – Steven Rumbalski Mar 05 '11 at 18:18

score 1 · Answer 6 · edited May 23 '17 at 12:18

Here's my implementation from a previous question:

def merge(*args):
    import copy
    def merge_lists(left, right):
        result = []
        while (len(left) and len(right)):
            which_list = (left if left[0] <= right[0] else right)
            result.append(which_list.pop(0))
        return result + left + right
    lists = [arg for arg in args]
    while len(lists) > 1:
        left, right = copy.copy(lists.pop(0)), copy.copy(lists.pop(0))
        result = merge_lists(left, right)
        lists.append(result)
    return lists.pop(0)

score 1 · Answer 7 · answered Nov 14 '10 at 01:31

Another generator:

def merge(xs, ys):
    xs = iter(xs)
    ys = iter(ys)
    try:
        y = next(ys)
    except StopIteration:
        for x in xs:
            yield x
        raise StopIteration
    while True:
        for x in xs:
            if x > y:
                yield y
                break
            yield x
        else:
            yield y
            for y in ys:
                yield y
            break
        xs, ys, y = ys, xs, x

score 0 · Answer 8 · answered Sep 12 '14 at 13:44

According to a note here:

# Note: the solution above is kind of cute, but unforunately list.pop(0)
# is not constant time with the standard python list implementation, so
# the above is not strictly linear time.
# An alternate approach uses pop(-1) to remove the endmost elements
# from each list, building a solution list which is backwards.
# Then use reversed() to put the result back in the correct order. That
# solution works in linear time, but is more ugly.

and this link http://www.ics.uci.edu/~pattis/ICS-33/lectures/complexitypython.txt

append is O(1), reverse is O(n) but then it also says that pop is O(n) so which is which? Anyway I have modified the accepted answer to use pop(-1):

def linear_merge(list1, list2):
    # +++your code here+++
    ret = []
    while list1 and list2:
        if list1[-1] > list2[-1]:
            ret.append(list1.pop(-1))
        else:
            ret.append(list2.pop(-1))

    ret.reverse()

    return list1 + list2 + ret

score 0 · Answer 9 · answered May 21 '18 at 16:47

This solution runs in linear time and without editing l1 and l2:

def merge(l1, l2):
  m, m2 = len(l1), len(l2)
  newList = []
  l, r = 0, 0
  while l < m and r < m2:
    if l1[l] < l2[r]:
      newList.append(l1[l])
      l += 1
    else:
      newList.append(l2[r])
      r += 1
  return newList + l1[l:] + l2[r:]

My implementation of merging two sorted lists in linear time - what could be improved?

9 Answers9

Why stop at two lists?

Linked