1

How can I merge 2 arbitrary unsorted lists a and b while preserving the order from both lists? I do not want to discard duplicates or sort. Formatting of whitespace is just to allow visualization for human readers:

a = ['a', 'b', 'c', 'X',       'Y',      '127'      ]
b = ['a', 'b',      'X', '44', 'Y', 'w', '127', 'X' ]

Desired output m:

m = ['a', 'b', 'c', 'X', '44', 'Y', 'w', '127', 'X' ]

I should specify the merge order more concisely. Matching priority goes to list a: Attempt to match every item in list a to an item in b

So if  a = ['b',      'a', 'r', 'n']
and    b = ['b', 'r', 'a',      'n'],
merged m = ['b', 'r', 'a', 'r', 'n']

A dictionary solution deletes duplicates (b[2] and b[7] both are 'X').

This needs to be programmatic; I'm looking at millions of lines of data. difflib is interesting, possibly for another problem, but I don't think it helps with this one.

I am using Python 2.7.12 on Windows-7

The code below solves this problem, but it is not as clean and simple as I would like.

def merge_lists(a, b):
    """ Merge 2 arbitrary unsorted lists a and b while preserving
    the order from both lists. Do not discard duplicates or sort."""

    ia = ib = 0
    m = []    
    if len(a) == 0:
        m = b
    elif len(b) == 0:
        m = a

    while ia < len(a) and ib < len(b):
        if a[ia] == b[ib]:
            m.append(a[ia])
            ia += 1
            ib += 1
        else:
            count = b[ib:].count(a[ia])
            if count == 0:
                m.append(a[ia])
                ia += 1
            else:
                k = b[ib:].index(a[ia])
                for i in range(ib, k+ib):
                    m.append(b[i])
                ib += k

        if ia >= len(a):
            for i in range(ib, len(b)):
                m.append(b[i])
        if ib >= len(b):
            for i in range(ia, len(a)):
                m.append(a[i])

    return m #--- END --- merge_lists()
Arya McCarthy
  • 8,554
  • 4
  • 34
  • 56
B. Finch
  • 11
  • 3
  • 6
    Welcome to [so]! As you may be aware, SO is a question-and-answer site. Readers, such as yourself, ask questions and other readers attempt to answer them. Your post is missing the essential element of an SO post -- the question! What, precisely, is your question? – Robᵩ Nov 07 '16 at 17:45
  • 2
    The way that you inserted spaces in your literals for `a` and `b` seems to be under-specified. Please explain the logic (which is a good first step for writing the code. – John Coleman Nov 07 '16 at 17:47
  • 6
    You didn't define the *order* of the new list well enough: for example, say `a=[1]` and `b=[2]` what should be the result ? `[1,2]` or `[2,1]` ? – Nir Alfasi Nov 07 '16 at 17:47
  • Possible duplicate of [How to merge two Python dictionaries in a single expression?](http://stackoverflow.com/questions/38987/how-to-merge-two-python-dictionaries-in-a-single-expression) – Olian04 Nov 07 '16 at 17:47
  • 3
    What output do you expect when your two lists are `['b', 'a', 'r', 'n']` and `['b', 'r', 'a', 'n']`? – Zero Piraeus Nov 07 '16 at 17:50
  • @Olian04: Heck no. This has pretty much nothing to do with that question. – user2357112 Nov 07 '16 at 17:55
  • If you can combine those lists into one (a+b or b+a isn't exactly working for me with your expected order, or I would have posted this as an answer); you can then use this answer to remove dupes without losing the order: http://stackoverflow.com/a/480227/2302482 – Bahrom Nov 07 '16 at 18:00

2 Answers2

3

You can probably use difflib, but you'll have to do the merging yourself. This can be tricky -- And isn't always easy to do with a computer (consider when you have to manually merge conflicts in your Version Control System). However, a lot of cases can be handled automatically.

Here's some code to help get you started -- I don't guarantee it works in all cases, so if you find inprovements, please feel free to let me know and I'll try to edit them in:

a = ['a', 'b', 'c', 'X',       'Y',      '127'      ]
b = ['a', 'b',      'X', '44', 'Y', 'w', '127', 'X' ]
import difflib
matches = difflib.SequenceMatcher(a=a, b=b).get_matching_blocks()

# Pointers to locations in the `a` and `b` lists to keep track of our progress.
ix_a = 0
ix_b = 0
c = []
for match in matches:
    # Add in missing elements from `a` -- Assume `a` comes first?
    if match.a > ix_a:
        c.extend(a[ix_a: match.a])
    # add in missing elements from `b`
    if match.b > ix_b:
        c.extend(b[ix_b: match.b])
    # add in common elements.
    part = a[match.a:match.a + match.size]
    c.extend(part)

    # update our pointers into the original sequences.
    ix_a = match.a + match.size
    ix_b = match.b + match.size

print(c)

Obviously you're a little bit at the mercy of exactly how difflib chooses to match runs in the data as well. e.g. the example that was pointed out by @ZeroPiraeus: ('barn' and 'bran')1 results in 'brarn', but if you reverse the order of the inputs you get 'baran'.

The surprising thing here isn't that the two results are different -- Indeed, it's somewhat natural to expect them to have different orderings if you change the order of the inputs. However, It probably is surprising that they have completely different values (one has two 'r' and the other has two 'a').

1I'm formatting these lists as strings for the sake of brevity.

mgilson
  • 300,191
  • 65
  • 633
  • 696
1

There is no single correct result, as the problem is specified. For example:

a = ['b',      'r', 'a', 'n']
b = ['b', 'a', 'r',      'n']
m = ['b', 'a', 'r', 'a', 'n']

a = ['b', 'r', 'a',      'n']
b = ['b',      'a', 'r', 'n']
m = ['b', 'r', 'a', 'r', 'n']
Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160
  • A bit confused as to whether I should up-vote this answer – smac89 Nov 07 '16 at 18:08
  • @smac89 Oh, you definitely should ;-) – Zero Piraeus Nov 07 '16 at 18:08
  • @smac89 less flippantly, [this meta question](http://meta.stackoverflow.com/questions/261168/is-this-is-not-possible-an-acceptable-answer) may be relevant to your quandary. – Zero Piraeus Nov 07 '16 at 18:14
  • I agree with that meta-post. However, I don't necessarily agree that it justifies this answer :-). Saying "This is not possible" is very different than saying "This is under-specified". The meta-post seems to talk about the former whereas this answer is trying to say the latter. It may well be that _either_ of the merges would be acceptable in this case. Personally, I feel like this would be better as a comment for OP demonstrating the need for further clarification. – mgilson Nov 07 '16 at 20:22
  • @mgilson it's borderline, I'll grant that. What pushed me into answering (rather than expanding on the comment I'd already made) was that formatting, including whitespace, clarifies the issue significantly. I would certainly nudge OP toward accepting your answer over mine, (and have only failed to upvote yours because I've also failed to read it properly), but I still think mine is more illuminating than a comment could be. I won't be particularly upset if you feel it deserves a downvote, though. – Zero Piraeus Nov 07 '16 at 20:54