0

So I have this tricky dictionary of tuples which I want to filter based on the first occurrence of the informative flag in the value elements. If the flag (which is the element occupying the first position of the tuple) is observed in other keys I will only retain only the first key-value pair in which it occurs and subsequent key-value pairs which contain the flag would be skipped.

old_dict = {'abc':[('abc', '1', '5'), ('def', '1', '5'), ('abcd', '2', '5')],
            'def':[('abc', '2', '5'), ('def', '1', '5'), ('abcd', '1', '5')],
            'ghi':[('ghi', '1', '5'), ('jkl', '1', '4'), ('mno', '2', '4')]}

I have struggled with a lot of attempts and this latest attempt does not produce anything meaningful.

flgset = set()
new_dict = {}

for elem, tp in old_dict.items():
       for flg in tp:
              flgset.add(flg[0])

counter = 0
for elem, tp in old_dict.items():
       for (item1, item2, item3) in tp:
              for flg in flgset:
                     if flg == item1:
                            counter = 1
                            new_dict[elem] = [(item1, item2, item3)]
                            break

Expected results should be:

new_dict = {'abc':[('abc', '1', '5'), ('def', '1', '5'), ('abcd', '2', '5')],
            'ghi':[('ghi', '1', '5'), ('jkl', '1', '4'), ('mno', '2', '4')]}

Thanks in advance.

user3014974
  • 113
  • 8
  • 3
    Dicts in Python do not have a defined order, so it does not make sense to speak of the "first" occurance. Depending on what you're trying to do you either need a list or maybe OrderedDict. – Vinay Pai Sep 10 '19 at 15:10
  • 3
    @Vinay Pai , Dicts as of python 3.6+ do have a defined order. They are sorted by insertion order. – SyntaxVoid Sep 10 '19 at 15:49
  • 2
    Yes, dicts _are_ ordered now. However, the source of that dictionary may not be ordered. For instance, if `old_dict` is coming from a JSON object (which is explicitly defined as unordered), you may not get the results you'd expect. – Kirk Strauser Sep 10 '19 at 15:57
  • @SyntaxVoid I wasn't aware of that change, thanks for the info! – Vinay Pai Sep 10 '19 at 21:31

2 Answers2

2

If i get you correctly, the following should do what you want:

flgset = set()
new_dict = {}

for k, tuple_list in old_dict.items():

    # if the key is not in flgset, just keep the k, tuple_list pair
    if k not in flgset:
        new_dict[k] = tuple_list

        # update the elements into flgset
        # item in this case is ('abc', '2', '5'), 
        # since you only want to add the first element, use item[0]
        for item in tuple_list:
            flgset.add(item[0])

Output as such:

new_dict = {'abc': [('abc', '1', '5'), ('def', '1', '5'), ('abcd', '2', '5')],
            'ghi': [('ghi', '1', '5'), ('jkl', '1', '4'), ('mno', '2', '4')]}

flgset = {'abc', 'abcd', 'def', 'ghi', 'jkl', 'mno'}
Toukenize
  • 1,390
  • 1
  • 7
  • 11
1

Others may have more efficient ways to do this, but here's one solution that incorporates your intuitions that you need to loop over old_dict items and use a set:

for key, val in old_dict.items():
    if val[0][0] not in set([v[0][0] for v in new_dict.values()]):
        new_dict.update({key: val})

Here's a brief explanation of what's going on: First, val[0][0] is the "informative flag" from your dictionary entry (i.e. the first item of the first tuple in the entry list). set([v[0][0] for v in new_dict.values()]) will give you the unique values of that flag in your new dictionary. The inner part is a list comprehension to get all the "flags" and then set will give a unique list. The last line just uses the update method to append to it.

REVISED ANSWER

@VinayPai raises two important issues below in the comments. First, this code is inefficient because it reconstructs the test set each time. Here's the more efficient way he suggests:

flag_list = set()
for key, val in old_dict.items():
    if val[0][0] not in flag_list:
        new_dict.update({key: val})
        flag_list.add(val[0][0])

The second issue is that this will produce inconsistent results because dictionaries are not ordered. One possible solution is to use an OrderedDict. But as @SyntaxVoid suggests, this is only necessary if you're using Python3.5 or earlier (here is a great answer discussing the change). If you can create your data in this fashion, it would solve the problem:

from collections import OrderedDict
old_dict = OrderedDict{'abc':[('abc', '1', '5'), ('def', '1', '5'), ('abcd', '2', '5')],
                       'def':[('abc', '2', '5'), ('def', '1', '5'), ('abcd', '1', '5')],
                       'ghi':[('ghi', '1', '5'), ('jkl', '1', '4'), ('mno', '2', '4')]}
Brendan A.
  • 1,268
  • 11
  • 16
  • This won't produce consistent result because dictionaries don't have a defined order. Your code will work if it's an OrderedDict, but it's unnecessarily inefficient. There is no reason to keep creating the set over and over again, you can just create it and add to it every time you update new_dict. – Vinay Pai Sep 10 '19 at 15:31
  • 2
    Dictionaries are ordered after python 3.6 based on when they were inserted into the dictionary. – SyntaxVoid Sep 10 '19 at 15:51
  • 1
    Thanks @VinayPai and SyntaxVoid, I've edited my answer to include both of your suggestions – Brendan A. Sep 10 '19 at 15:56
  • @BrendanA. almost there, flag_list should be set() rather than [], and you need flag_list.add rather than append. The "in" operator is O(n) for a list, O(1) for a set on avergage. https://wiki.python.org/moin/TimeComplexity – Vinay Pai Sep 10 '19 at 21:34
  • Thanks, I didn't realize that! – Brendan A. Sep 10 '19 at 23:32