5

I have 2 lists of dicts like

list1 = [{'count': 351, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'},
     {'count': 332, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'},
     {'count': 336, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'},
     {'count': 359, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'},
     {'count': 309, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'}]


list2 = [{'count': 359, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'},
     {'count': 351, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'},
     {'count': 381, 'evt_datetime': datetime.datetime(2015, 10, 22, 8, 45), 'att_value': 'red'}]

I am trying to get common dicts from both the list. My desired output to be exact matches of the keys and values of the dict.

[{'count': 359, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'},
     {'count': 351, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'}]

Can this done by python itself efficiently or would require lib like pandas?

shivg
  • 742
  • 1
  • 8
  • 28

3 Answers3

12

Use list comprehension:

[x for x in list1 if x in list2]

This returns me this list for your data:

[{'count': 351, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'}, {'count': 359, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'}]
Netwave
  • 40,134
  • 6
  • 50
  • 93
  • 1
    Nice! i dont expect this can be done in a simple list comprehension. Works like a charm. Thanks – shivg Nov 05 '15 at 11:25
  • 1000 times better than a for loop. Thanks. – ConanG Sep 21 '17 at 09:04
  • see comment in https://stackoverflow.com/a/33067553/1497139: This is an O(n^2) solution, whereas there are solutions with O(n) - meaning that for 1000 elements this solution is 1000x slower than the optimal solution using sorted lists. – Wolfgang Fahl Aug 11 '20 at 08:49
  • See O(n log n) solution below if lists are not priorly sorted. – Wolfgang Fahl Aug 11 '20 at 10:39
2

The solution below might perform better for large lists but might also need more memory due to the sorting step.

The intersection can either be done over a defined sortKey e.g. 'count' or the the hash of the dictionary will be used as suggested by https://stackoverflow.com/a/60765557/1497139. The algorithm sorts the two lists and iterates in parallel checking for the three possible states of the two iterators:

  • first iterator behind second -> progress first iterator
  • second iterator behind first -> progress second iterator
  • both at same position -> found an intersection element

In the given example using the "count" field as a sortKey gives the same reuslt as using the dict hashes as keys.

[{'count': 351, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'}, {'count': 359, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'}]
[{'count': 351, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'}, {'count': 359, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'}]

python unit test

def testIntersection(self):
        '''
        test creating the intersection of a list of dictionaries
        '''
        list1 = [{'count': 351, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'},
         {'count': 332, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'},
         {'count': 336, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'},
         {'count': 359, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'},
         {'count': 309, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'}]

        list2 = [{'count': 359, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'},
             {'count': 351, 'evt_datetime': datetime.datetime(2015, 10, 23, 8, 45), 'att_value': 'red'},
             {'count': 381, 'evt_datetime': datetime.datetime(2015, 10, 22, 8, 45), 'att_value': 'red'}]
        
        listi=ListOfDict.intersect(list1, list2,'count')
        print(listi)
        self.assertEquals(2,len(listi))
        listi=ListOfDict.intersect(list1, list2)
        print(listi)
        self.assertEquals(2,len(listi))

ListOfDict

'''
Created on 2020-08-11

@author: wf
'''

class ListOfDict(object):
    '''
    https://stackoverflow.com/questions/33542997/python-intersection-of-2-lists-of-dictionaries/33543164
    '''
    @staticmethod  
    def sortKey(d,key=None):
        ''' get the sort key for the given dict d with the given key
        '''
        if key is None:
            # https://stackoverflow.com/a/60765557/1497139
            return hash(tuple(d.items()))
        else:
            return d[key] 
        
    @staticmethod            
    def intersect(listOfDict1,listOfDict2,key=None):
        '''
        get the  intersection lf the two lists of Dicts by the given key 
        '''
        i1=iter(sorted(listOfDict1, key=lambda k: ListOfDict.sortKey(k, key)))
        i2=iter(sorted(listOfDict2, key=lambda k: ListOfDict.sortKey(k, key)))
        c1=next(i1)
        c2=next(i2)
        lr=[]
        while True:
            try:
                val1=ListOfDict.sortKey(c1,key)
                val2=ListOfDict.sortKey(c2,key)
                if val1<val2:
                    c1=next(i1)
                elif val1>val2:
                    c2=next(i2)
                else:
                    lr.append(c1)
                    c1=next(i1)
                    c2=next(i2)
            except StopIteration:
                break     
        return lr  


    
Wolfgang Fahl
  • 15,016
  • 11
  • 93
  • 186
-4

If order is not important and you don't need to worry about duplicates then you can use set intersection:

a = [1,2,3,4,5]
b = [1,3,5,6]
list(set(a) & set(b))
[1, 3, 5]
Rutul Raval
  • 313
  • 2
  • 10