4

Say i have two lists of dictionaries:

a=[{'name':'A','color':'1'},
   {'name':'B','color':'2'}]
b=[{'name':'A','color':'3'},
   {'name':'c','color':'1'}]

I need something like this:

for i in a:
    if i['name'] is not existent in b['name']:
         do this.
    else:
        if i['color'] is < than the corresponding item in b list:
            do that.

I don't know how to fetch the element form the second list which causes the iteration to go on "else:".

I need to say that the first list is smaller(several hundred items) but the second one has several thousand dictionaries, each with about a hundred items- efficiency is very important.

I did think about making a list of all values for the key['name'] in both lists and compare that but it would mean iterating first time to make these lists and then reiterating over the lists in order to do this or do that. Thanks in advance!

Abijith Mg
  • 2,647
  • 21
  • 35
Mike
  • 466
  • 9
  • 24
  • If the names in `b` is unique. I suggest you change the construct of `b`. b={'A':{'color':'3'},'c':{'color':'1'}}. Then you can use b[i['name']] to get the properties such as {'color':'1'}. so b[i['name']]['color'] is the number of color. It will be more efficient to use this construct. I can give you more examples as answer if you agree to do so. – Zealseeker Mar 28 '17 at 07:35
  • @Zealseeker each element in b is a unique dictionary with ~100keys, each with a value/key, one of the keys being 'name'. Ur saying to transform from list of dictionaries to a nested dictionary. I need to think this through. – Mike Mar 28 '17 at 07:40
  • Instead of making a list of values for both the lists, do it only for list B. And then change your if condition to `if i['name'] is not existent in names_in_b:` You will need to check the performance gain of this method, but with this you will need just one iteration over B and then iteration over A follows. – Ymartin Mar 28 '17 at 07:41
  • You can keep values only present on `a` and `b` by names, split into 2 new array. One of this array will contain only names that are unique. sort the other one and `zip` them. – Kruupös Mar 28 '17 at 07:44
  • @Mike If the name can be the unique index of each item, it is better to create a nested dictionary because it will make it fast to search. Also you can easily create a dictionary that only conserve the indices of the items. For example: `{'A': 0, 'c': 1}` So that it will be quicker. – Zealseeker Mar 28 '17 at 07:44

4 Answers4

3

You absolutely want to do an iteration over b before you start. The only obvious alternative is to iterate over b for every item in a, which is obviously worse.

b_dict = {x['name']: x for x in b}
for item in a:
    if item['name'] in b_dict:
        f(b_dict['name']) 
    else:
        pass  # whatever

You may be interested in the get() method of Python dictionaries, if you wish to avoid using in followed by immediately getting the element.

Arthur Tacca
  • 8,833
  • 2
  • 31
  • 49
0

You can use a hash table or something like this:

a=[{'name':'A','color':'1'},
   {'name':'B','color':'2'}]
b=[{'name':'A','color':'3'},
   {'name':'c','color':'1'}]

for item in a:
    mach = list(filter(lambda x: x['name'] == item['name'], b))
    if mach:
        if int(item['color']) > int(mach[0]['color']):
            do that
    else:
        do this

Dict in Python is a kind of hash table with amortized O(1). Then you can change your code to this:

a=[{'name':'A','color':'1'},
   {'name':'B','color':'2'}]
b=[{'name':'A','color':'3'},
   {'name':'c','color':'1'}]

b_dict = {item['name']:item['color'] for item in b}
for item in a:
    if item['name'] in b_dict and int(item['color']) < int(b_dict[item['name']]):
        print 'do this'
    else:
        print 'do that'
Community
  • 1
  • 1
RaminNietzsche
  • 2,683
  • 1
  • 20
  • 34
  • 1
    This is `n*n`. Since OP has mentioned that the lists are quite large, this solution won't scale! – Ymartin Mar 28 '17 at 07:50
  • Yes but You have two lists and you must search for each item, then your performance ~= O(n^2) – RaminNietzsche Mar 28 '17 at 07:51
  • 1
    you don't have to go through both lists each time – Mayeul sgc Mar 28 '17 at 07:54
  • 1
    @RaminNietzsche That is the reason the whole universe of "optimization" and "scalability" exists – Ymartin Mar 28 '17 at 07:59
  • We can store the second list in hash table but with two lists I think we can't find better way than O(n^2) – RaminNietzsche Mar 28 '17 at 08:06
  • 1
    @RaminNietzsche if the name is unique in each item of the second list, it is hashable. – Zealseeker Mar 28 '17 at 08:15
  • @Zealseeker If the name is not unique then a little more care is needed, but it is still possible to use a hash table. (Assuming we always want the first one, we would use an `OrderedDict` and build the hash table in reverse. If we want to consider all dictionaries with a matching name, then we can store a list for entry in the hash table.) – Arthur Tacca Mar 28 '17 at 08:41
0

First make an index:

a=[{'name':'A','color':'1'},
   {'name':'B','color':'2'}]

b=[{'name':'A','color':'3'},
   {'name':'C','color':'1'}]

dic = {}
for i,item in enumerate(b):
    dic[item['name']] = i
# dic will be {'A':0,'C':1}
for item in a:
    if not item['name'] in dic:
        #do this
    else:
        if b[dic[item['name']]]['color'] > item['color']:
            #do that
Anton Koval'
  • 4,863
  • 5
  • 31
  • 44
Zealseeker
  • 823
  • 1
  • 7
  • 23
  • Why not store references to the dictionaries in `b` directly, rather than their indices? – Arthur Tacca Mar 28 '17 at 07:54
  • @ArthurTacca yes, I saw your code and it's good. But I think (though it may not be true) directly store all the data may waste memory and reduce the performance ? – Zealseeker Mar 28 '17 at 07:56
  • 1
    Nope, that's not how variables work in Python; just references are stored. – Arthur Tacca Mar 28 '17 at 07:58
  • @ArthurTacca Ok, thanks. I did suggest the OP store the data in a dict as I commented to the OP. And this alternative way may be easier to understand :grin: – Zealseeker Mar 28 '17 at 08:02
0

Instead of making a list of values for both the lists, do it only for list B. And then change your if condition to

if i['name'] is not existent in names_in_b:

You will need to check the performance gain of this method, but with this solution, you will need just one iteration over B and then iteration over A follows.

Ymartin
  • 1,321
  • 1
  • 10
  • 24