Merge two list ,element is a range value

Question

description:

I need merge two lists whose element is a range value for serval (at least 10000) times efficiently. All elements should merge if they can.
Two lists are sorted already.
Every elements are strictly separate, it means these case are illegal:

list1 = [[1, 3], [3, 6]]
list2 = [[1, 4], [2, 5]]

list1 = [[1, 5], [6, 10]]
list2 = [[2, 4], [5, 8]]

example:

#elements: 0:start(inclusive) 1:stop(inclusive).
#the `ans` become next `list1`, to merge a new `list2` .

#input1:
list1 = [[1, 1],[3, 3]]
list2 = [[5, 5]]
#output1:
ans = [[1, 1], [3, 3], [5, 5]]

#input2:
list1 = [[1, 1], [3, 3], [5, 5]]
list2 = [[2, 2]]
#output2:
ans = [[1, 3], [5, 5]] # [1,1]+[2,2]+[3,3] = [1,3]

#input3:
list1 = [[1, 3], [5, 5]]
list2 = [[0, 0], [4, 4], [6, 6]]
#output:
ans = [[0,6]] #[0,0]+[1,3]+[4,4]+[5,5]+[6,6] = [0,6]

what I have tried:

    def merge(list1,list2):
        ans = sorted(list1+list2,key = lambda x:x[0])
        idx = 0
        while idx<len(ans):
            try:
                if ans[idx][1] == ans[idx+1][0] - 1:
                    ans[idx] = [ans[idx][0],ans[idx+1][1]]
                    del ans[idx+1]
                elif ans[idx][0] == ans[idx+1][1] + 1:
                    ans[idx] = [ans[idx+1][0],ans[idx][1]]
                    del ans[idx+1]
                else:
                    idx+=1
            except Exception:
                idx+=1
        return ans

It works, but it is really slow. It tooks about 15 secs to solve a difficult case, 1.2 sec to solve a simple case.
Desired time is less than 3 secs to solve a difficult case.

question

Any better solution?
Or which algorithm should I use? Maybe segment tree?

Are the two lists themselves sorted already? All your sample inputs indicate that they are. — user2390182, Aug 27 '21 at 07:06

user2390182 · Answer 1 · 2021-08-27T08:38:06.113

3

You are not using all the information. The lists are sorted already, so concatenating them and resorting is unnecessary.

def merge(list1, list2):
    # helper to process one interval from a given list
    def consume(lst, index):
        start, end = interval = lst[index]
        if not result or result[-1][-1] < start - 1:
            result.append(interval)
        elif result[-1][-1] < end:
            result[-1][-1] = end
        return index + 1

    result = []
    i1 = i2 = 0
    while i1 < len(list1) and i2 < len(list2):
        # saving some boilerplate code to always work on list1
        if list1[i1] > list2[i2]:
            list1, i1, list2, i2 = list2, i2, list1, i1
        i1 = consume(list1, i1)
    while i2 < len(list2):
        i2 = consume(list2, i2)
    return result

list1 = [[1, 1],[3, 3]]
list2 = [[5, 5]]
merge(list1, list2)
# [[1, 1], [3, 3], [5, 5]]
list1 = [[1, 1], [3, 3], [5, 5]]
list2 = [[2, 2]]
merge(list1, list2)
# [[1, 3], [5, 5]]
list1 = [[1, 3], [5, 5]]
list2 = [[0, 0], [4, 4], [6, 6]]
merge(list1, list2)
# [[0, 6]]

The algorithmic advantages of this approach seem to be outweighed by the brute C-Power of Python sorting over the context changes (particularly with the nested function). One could try to improve the solution suggested by @trincot a little though (not design-wise, but from a pure time/space-efficiency standpoint):

def merge(list1, list2):
    # sorted(l1+l2)  creates 2 new lists in memory
    list1.extend(list2)
    list1.sort()
    result = list1[:1]
    for interval in list1:
        start, end = interval
        _, e = last = result[-1]
        if e < start - 1:
            result.append(interval)
        elif e < end:  
            last[-1] = end
    return result

edited Aug 27 '21 at 08:38

answered Aug 27 '21 at 07:25

user2390182

72,016
6
67
89

Sorry I try your solution but it still cost 15 secs to solve a difficult case, while the @trincot solution cost 5 secs to a same case. +1 for your effort. – leaf_yakitori Aug 27 '21 at 07:35
https://stackoverflow.com/questions/464342/combining-two-sorted-lists-in-python, it shows if elements less than 1000000, concatenatie and resort it maybe the more efficient solution. – leaf_yakitori Aug 27 '21 at 07:49
This was also my solution since the time complexity is just linear `O(n+m)`. Maybe the `sorted` functionality in Python is so fast that it beats all the necessary extra loops made here :) – Niel Godfrey Pablo Ponciano Aug 27 '21 at 07:50
1

Yes, `sorted` is implemented with compiled code, so for limited list sizes (maybe below a million), it will still be faster than looping "manually" in Python, even though there is a time complexity difference. – trincot Aug 27 '21 at 08:14
Also be careful: this solution will *mutate* some elements in the input lists. I would change it a bit so that the result is made up of new pairs. It can be a surprise for a caller to find out that their input lists were altered after making the `merge` call. – trincot Aug 27 '21 at 08:23
@trincot Of course in production code, mutating arguments would be a no go (unless it's the purpose of a function). This seemingly being a mostly academic challenge... should be fine :) – user2390182 Aug 27 '21 at 08:37

trincot · Accepted Answer · 2021-08-27T06:59:24.473

del will have an impact on performance... try to avoid it. I think it is better to let the answer list start as an empty list, and append ranges to it as you go.

Also, I don't think it is worth to provide a key argument to sorted: by default it will anyway sort by the first value in the pairs, and if there is a tie, it will sort by the second value.

Here is how it could work:

def merge(list1,list2):
    ans = []
    last = [None, -float("inf")]
    for start, end in sorted(list1+list2):
        if start > last[1] + 1:
            last = [start, end]
            ans.append(last)
        elif end > last[1]:  # Merge:
            last[1] = end  # We mutate a pair that is already in the result
    return ans

Thanks! It does much quicker then mine, it tooks about 5.2 secs to solve a difficult case. I want to see more solution, so I'll accept your answer tomorrow if I don't get any better solution. — leaf_yakitori, Aug 27 '21 at 06:59

Merge two list ,element is a range value

description:

example:

what I have tried:

question

2 Answers2