Match two lists by index and name

Question

How can I compare two lists together, and create an output list where common items are shifted to match in index and name. The main list is made once and stays the same throughout the script.

There can be situations where the changing list will have items that do not exist in the main list, I'd like to create a separate list for these items...

Example:

main_list = ['apple', 'orange', 'banana', 'pear', 'mango', 'peach', 'strawberry']
changing_list = ['apple', 'banana', 'cucumber', 'peach', 'pear', 'fish']

output = ['apple', 'NA', 'banana', 'pear', 'NA', 'peach', 'NA']
added_output = ['cucumber', 'fish']

Using the sorted() function on each list before comparison may be of some use, however, I can't get my head around indicating that 'orange', for example is missing (preferably by using NA or X). I am aware of the option of using, sets and the '&' operator, however, using this does not indicate which item was missing with an index/positioning perspective (the NA part)

Possible duplicate of [Ordered intersection of two lists in Python](https://stackoverflow.com/questions/23529001/ordered-intersection-of-two-lists-in-python) — denov, Jan 30 '18 at 03:35

RoadRunner · Accepted Answer · 2018-01-30T04:23:26.917

You can do this with sets and list comprehensions:

def ordered_intersection(main_list, changing_list):
    changing_set = set(changing_list)
    output = [x if x in changing_set else 'NA' for x in main_list]

    output_set = set(output)
    added_output = [x for x in changing_list if x not in output_set]

    return output, added_output

Which works as follows:

>>> main_list = ['apple', 'orange', 'banana', 'pear', 'mango', 'peach', 'strawberry']
>>> changing_list = ['apple', 'banana', 'cucumber', 'peach', 'pear', 'fish']
>>> ordered_intersection(main_list, changing_list)
(['apple', 'NA', 'banana', 'pear', 'NA', 'peach', 'NA'], ['cucumber', 'fish'])

Explanation of above code:

First convert changing_list to a set, since set membership is constant time, as opposed to list membership which is linear time.
Since we want to maintain the order of main_list into output, we have to traverse all the elements in that list, and check if they exist in changing_set. This prevents quadratic time complexity for each operation, and allows linear behavior instead.
The above logic is also applied to added_output.

Turn · Answer 2 · 2018-01-30T03:54:11.913

0

Assuming that you don't care about duplicates, you can use sets to do this to find the differences efficiently:

output=[]
main_set, changing_set = set(main_list), set(changing_list)
for i in main_list:
    output.append(i if i not in changing_set else "NA")
added_output = changing_set - main_set

edited Jan 30 '18 at 03:54

answered Jan 30 '18 at 03:40

Turn

6,656
32
41

you lose order with sets – denov Jan 30 '18 at 03:55
@denov Yes, you are right. I assumed that was ok since the OP said sorting them first was fine. – Turn Jan 30 '18 at 03:56

Ashok Kumar Jayaraman · Answer 3 · 2018-01-30T04:11:11.767

0

The following approach works to match two lists by index and name

>>> main_list = ['apple', 'orange', 'banana', 'pear','mango', 'peach', 
'strawberry']
>>> changing_list = ['apple', 'banana', 'cucumber', 'peach', 'pear', 'fish']
>>> output = []
>>> for word in main_list:
...     if word in changing_list:
...             output.append(word)
...     else:
...             output.append('NA')
...
>>> output
['apple', 'NA', 'banana', 'pear', 'NA', 'peach', 'NA']

>>> added_output = []
>>> for word in changing_list:
...     if word not in main_list:
...             added_output.append(word)
...
>>> added_output
['cucumber', 'fish']

edited Jan 30 '18 at 04:11

answered Jan 30 '18 at 03:53

Ashok Kumar Jayaraman

2,887
2
32
40

This is like RoadRunner's original answer. It is an O(n^2) solution, so very expensive if the lists are long. – Turn Jan 30 '18 at 03:55
Yes @Turn, I saw it after posting it. – Ashok Kumar Jayaraman Jan 30 '18 at 03:58

Match two lists by index and name

3 Answers3

Linked