3

I am trying to find the difference between 2 list of dictionary. I found some information in this forum but did not serve my purpose.

incoming_rows = [{'column_name': 'LOAD_ID', 'data_type': 'int', 'table_name': 'CONFIG'},
            {'column_name': 'ROW_NUMBER', 'data_type': 'int', 'table_name': 'CONFIG'},
            {'column_name': 'CREATE_DATE', 'data_type': 'VARCHAR(20)', 'table_name': 'CONFIG'},
            {'column_name': 'CONFIG_TYPE', 'data_type': 'varchar(1)', 'table_name': 'CONFIG'},
            {'column_name': 'CONFIG_ID', 'data_type': 'numeric(10,0)', 'table_name': 'CONFIG'}
            ]

available_row = [{'column_name': 'LOAD_ID', 'data_type': 'int', 'table_name': 'CONFIG'},
             {'column_name': 'ROW_NUMBER', 'data_type': 'int', 'table_name': 'CONFIG'},
             {'column_name': 'CREATE_DATE', 'data_type': 'date', 'table_name': 'CONFIG'}
             ]

Here I need to compare the incoming_rows with the available_row list of dictionary and the difference want to list in another list of dict format.Here my table name is unique. Conditions: 1. Any new addition of columns. 2. Any change in data type If these two conditions are true then the the expected_row should contain only these changed rows only.

# expected output
expected_row=[{'column_name': 'CONFIG_TYPE', 'data_type': 'varchar(1)', 'table_name': 'CONFIG'},
          {'column_name': 'CONFIG_ID', 'data_type': 'numeric(10,0)', 'table_name': 'CONFIG'},
          {'column_name': 'CREATE_DATE', 'data_type': 'VARCHAR(20)', 'table_name': 'CONFIG'}
        ]
jpp
  • 159,742
  • 34
  • 281
  • 339
Pradeep
  • 303
  • 1
  • 5
  • 18
  • find intersection between 2 list. this might help: http://stackoverflow.com/questions/642763/python-intersection-of-two-lists – Thang Do Apr 22 '16 at 16:12
  • @csessh since we're trying to find the intersection of a list *full of unhashable objects*, that link is mostly irrelevant. (e.g., try `set().add({"some":"dict"})`) – Adam Smith Apr 22 '16 at 16:14

3 Answers3

6

A set is the perfect solution for this problem. Unfortunately, python will not let you add dictionaries to a set, because they are mutable and their hashcode could change between insert and lookup.

If you "freeze" the items to make them immutable, you can then add them to set objects instead of a list; and then take a set difference using the minus operator:

In [20]: i_set = { frozenset(row.items()) for row in incoming_rows }

In [21]: a_set = { frozenset(row.items())  for row in available_row }

In [22]: (i_set - a_set)
Out[22]: 
{frozenset({('column_name', 'CONFIG_ID'),
            ('data_type', 'numeric(10,0)'),
            ('table_name', 'CONFIG')}),
 frozenset({('column_name', 'CREATE_DATE'),
            ('data_type', 'VARCHAR(20)'),
            ('table_name', 'CONFIG')}),
 frozenset({('column_name', 'CONFIG_TYPE'),
            ('data_type', 'varchar(1)'),
            ('table_name', 'CONFIG')})}

Edit: To unfreeze:

In [25]: [dict(i) for i in i_set - a_set]
Out[25]: 
[{'column_name': 'CONFIG_ID',
  'data_type': 'numeric(10,0)',
  'table_name': 'CONFIG'},
 {'column_name': 'CREATE_DATE',
  'data_type': 'VARCHAR(20)',
  'table_name': 'CONFIG'},
 {'column_name': 'CONFIG_TYPE',
  'data_type': 'varchar(1)',
  'table_name': 'CONFIG'}]
Neal Fultz
  • 9,282
  • 1
  • 39
  • 60
  • I've never heard of "freezing" an item before! Could you please explain a little what it is? thanks! – Joseph Farah Apr 22 '16 at 16:26
  • A frozenset is an immutable type of set in Python (e.g. you can't do `frozenset.add(...)` or etc). Since it's immutable, it's (sanely) hashable, so you can put it in a set. – Adam Smith Apr 22 '16 at 17:32
0

For large datasets, and especially when you are working with numeric data, you may find better performance with 3rd party libraries. For example, Pandas accepts lists of directories directly:

import pandas as pd

# convert lists of dictionaries to dataframes
df_incoming, df_available = map(pd.DataFrame, (incoming_rows, available_row))

# merge data, adding indicator, and filter
res = df_available.merge(df_incoming, indicator=True, how='outer')
res = res[res['_merge'] == 'right_only']

print(res)

   column_name      data_type table_name      _merge
3  CREATE_DATE    VARCHAR(20)     CONFIG  right_only
4  CONFIG_TYPE     varchar(1)     CONFIG  right_only
5    CONFIG_ID  numeric(10,0)     CONFIG  right_only

If you require a list of dictionaries as output:

print(res.drop('_merge', 1).to_dict('records'))

[{'column_name': 'CREATE_DATE', 'data_type': 'VARCHAR(20)', 'table_name': 'CONFIG'},
 {'column_name': 'CONFIG_TYPE', 'data_type': 'varchar(1)', 'table_name': 'CONFIG'},
 {'column_name': 'CONFIG_ID', 'data_type': 'numeric(10,0)', 'table_name': 'CONFIG'}]
jpp
  • 159,742
  • 34
  • 281
  • 339
-2
for incoming_d, available_d in zip(incoming_rows, available_row):
    for k,v in incoming_d:
        if k in available_d and available_d[k] == v:
            # this key is in both dicts
        else:
            # something went wrong

Note that this requires your lists to be in the same order. If order doesn't matter in the lists, then you'd have to nest your for loops (and complexity becomes much higher)

for incoming_d in incoming_rows:
    for available_d in available_rows:
        ...
Adam Smith
  • 52,157
  • 12
  • 73
  • 112