Python - Filter dictionary by specific filter tuple

Question

EDITED input and output based on comments to make the question more clear.

I have a dictionary with unique keys, but some of them represent different tiers of the same dataset (they have the same name except for the last two characters).

EDIT: Not every dataset is generated with every tier, so there might be datasets which are only available at tier "T1", while others are available for multiple tiers. (END OF EDIT)

I also have a tuple holding the tier levels. Now I want to filter the dictionary to contain only the "best" available tier. The tier is part of the key, but can also be taken from the value of each dictionary entry. Here is a MWE:

my_dict = {
    'LC08_L1TP_200029_20210716_20210721_02_T1': {  # best tier for this dataset --> keep it
        'cc': 30.57,
        'tier': 'T1',
    },
    'LC08_L1TP_200029_20210716_20210721_02_RT': {  # worst tier for this dataset --> remove it
        'cc': 30.57,
        'tier': 'RT',
    },
    'LC08_L1TP_200029_20210630_20210708_02_T2': {  # worst tier for this dataset --> remove it
        'cc': 60.52,
        'tier': 'T2',
    },
    'LC08_L1TP_200029_20210630_20210708_02_RT': {  # best tier for this dataset --> keep it
        'cc': 60.52,
        'tier': 'RT',
    },
    'LC08_L1TP_200029_20210614_20210628_02_T2': {  # only tier for this datset --> keep it
        'cc': 15.61,
        'tier': 'T2',
    },
}
tiers = ('T1', 'RT', 'T2')  # this is the tier order

In the end, I want a new dictionary that looks like this, holding only the "best" available tier based on tiers:

{
    'LC08_L1TP_200029_20210716_20210721_02_T1': {
        'cc': 30.57,
        'tier': 'T1',
    },
    'LC08_L1TP_200029_20210630_20210708_02_RT': {
        'cc': 60.52,
        'tier': 'RT',
    },
    'LC08_L1TP_200029_20210614_20210628_02_T2': {
        'cc': 15.61,
        'tier': 'T2',
    },
}

I know the key=lambda x functionality for sorting, as described at How do I sort a list of dictionaries by a value of the dictionary?, but just sorting is not what I'm aiming at.

Also thought of something like this, but it obviously does not work as I need it:

for key in my_dict.keys():
    for tier in tiers:
        if key.endswith(tier):
            new_dict[key] = my_dict[key]
            break

I don't even know where to start, can't get my head around it. — s6hebern, Aug 03 '21 at 06:58
If you haven't tried then at least you had done some research, right? you may provide that as well. — imxitiz, Aug 03 '21 at 07:00
Added the only idea I came up with, besides that, I'm a bit lost. — s6hebern, Aug 03 '21 at 07:19
@s6hebern Why isn't your output showing an entry for ```T2``` ? ```T2``` is present in ```tiers```. — Ram, Aug 03 '21 at 07:21
Because it is not necessarily present in the dictionary. I can add an entry for it, but in many cases for my real-world use case, not every tier is available at all times. — s6hebern, Aug 03 '21 at 07:25
**not necessarily present in the dictionary** ? It is present in your dictionary and tuple as well. What do you mean by that ? — Ram, Aug 03 '21 at 07:26
@s6hebern please clearly mention that you have edited the expected output from the question. cuz some of the OPs gave answer as your previous expected output. — imxitiz, Aug 03 '21 at 07:45
@s6hebern Are you really trying to get the last occurrence and sorted with that tiers list? cuz I have found that pattern in your expected output! Am I right? — imxitiz, Aug 03 '21 at 08:54
@Xitiz no, the `tiers` give the order, meaning that `T1` is superior to `RT`, which itself is superior to `T2`. You could also phrase it like `'T2' < 'RT' < 'T1'`. — s6hebern, Aug 03 '21 at 08:58
But you have multiple occurrence of `tiers` in `dict` then how can you say that! And as you said is accepted answer, answered? There is at first `RT` and `T1` and `T2` but as your last comment that should be `T1` `RT` `T2` — imxitiz, Aug 03 '21 at 09:03
Commented on the accepted answer that I needed to make a slight adjustment for my real use case. Shall I edit the answer instead? — s6hebern, Aug 03 '21 at 09:12
I am just saying that your expected output and ouput gave by accepted answer is not same. Isn't same in any case, first you asked for sorting but no that answer is not really doing sorting and second if that is sorted then also keys are not same as you expect. No, I don't mean `RT`,`T1` as that answer but I mean `LC08_L1TP_200029_20210630_20210708_02_T2` is that what you need for `T2` tier in your expected output there is `LC08_L1TP_200029_20210614_20210628_02_T2` for `T2`. — imxitiz, Aug 03 '21 at 09:17

napuzba · Accepted Answer · 2021-08-03T08:13:31.460

You can use itertools.groupby for this task

tiers = {'T1':1, 'RT':2, 'T2':3 }  # this is the tier order

import itertools

data = {}
by_tier = sorted( my_dict.items(), key= lambda kv: kv[1]['tier'] )
for tier,group in itertools.groupby( by_tier , key= lambda kv: kv[1]['tier']):
  max_item = max( group, key=lambda kv: kv[1]['cc'])
  data[tier] = { max_item[0] : max_item[1] }

{'RT': {'LC08_L1TP_200029_20210630_20210708_02_RT': {'cc': 60.52,
                                                     'tier': 'RT'}},
 'T1': {'LC08_L1TP_200029_20210716_20210721_02_T1': {'cc': 30.57,
                                                     'tier': 'T1'}},
 'T2': {'LC08_L1TP_200029_20210630_20210708_02_T2': {'cc': 60.52,
                                                     'tier': 'T2'}}}

First version of the question

tiers = {'T1':1, 'RT':2, 'T2':3 }  # this is the tier order

import itertools

by_tier = sorted( my_dict.items(), key= lambda kv: tiers[kv[1]['tier']] )
for tier,group in itertools.groupby( by_tier , key= lambda kv: kv[1]['tier']):
  print("for tier {0}".format(tier))
  for item in group:
    print("  ==> {0}".format(item))

for tier T1
  ==> ('LC08_L1TP_200029_20210716_20210721_02_T1', {'cc': 30.57, 'tier': 'T1'})
for tier RT
  ==> ('LC08_L1TP_200029_20210716_20210721_02_RT', {'cc': 30.57, 'tier': 'RT'})
  ==> ('LC08_L1TP_200029_20210630_20210708_02_RT', {'cc': 60.52, 'tier': 'RT'})
for tier T2
  ==> ('LC08_L1TP_200029_20210630_20210708_02_T2', {'cc': 60.52, 'tier': 'T2'})

Now you can easily generate the required format.

With a slight adjustment to my real-world use case, this solution works well. Thanks and sorry for the confusion at the start. — s6hebern, Aug 03 '21 at 09:00
Since `cc` is not really interesting here, my adjustments are: `max_item = max( group, key=lambda kv: kv[1]['tier'])` `data[max_item[0]] = max_item[1]` — s6hebern, Aug 03 '21 at 09:06

score 0 · Answer 2 · answered Aug 03 '21 at 07:30

You could break down the problem in the following way:

Get the unique names of the datasets datasets:
- Extract the keys from the dictionary k = list(my_dict.keys())
- remove the tier ds = map(lamba x: x[:-2], k)
- create a list containing only unique names ds = list(set(ds))

Then go through your dictionary and find the best available dataset by finding out which key (dataset name + tier) is actually in the dictionary. If you do this in the right order of the tiers, you'll get the right result.

highest_tiers = []
for d in ds:
 for t in tiers[::-1]:
    k_t = k+t
    if k_t in list(my_dict.keys()):
       highest_tiers.append(k_t)
       break

Benoit Drogou · Answer 3 · 2021-08-03T08:40:33.410

Edited with keeping only best of each tiers

In[1]:
import pandas as pd
def getBestOfEachTier(dictionary):
    tiers_best = {}
    for key, value in dictionary.items():
        if value['tier'] not in tiers_best.keys():
            tiers_best[value['tier']] = value['cc']
        else:
            if tiers_best[value['tier']]< value['cc']:
                tiers_best[value['tier']] = value['cc']
    return tiers_best

def filterDict(dictionary, tiers_best):
    res = {key: val for key, val in dictionary.items() if tiers_best[val['tier']] == val['cc']}
    return res

tiers = getBestOfEachTier(my_dict)
filterDict(my_dict, tiers)


Out[2]:
{'LC08_L1TP_200029_20210716_20210721_02_T1': {'cc': 30.57, 'tier': 'T1'},
 'LC08_L1TP_200029_20210630_20210708_02_T2': {'cc': 60.52, 'tier': 'T2'},
 'LC08_L1TP_200029_20210630_20210708_02_RT': {'cc': 60.52, 'tier': 'RT'}}

Yes, only the best value shall be kept, all others shall be removed — s6hebern, Aug 03 '21 at 07:35
"Best" does not mean lowest `cc`, the "quality" is indicated purely by the `tiers` with `'T2' < 'RT' < 'T1'`. — s6hebern, Aug 03 '21 at 09:01

Ram · Answer 4 · 2021-08-03T07:46:50.930

0

As far as I understand, by "best" you mean the tiers having the maximum cc value.

You need to first Sort the dictionary (based on the cc keys) to make the filtering easier.
Iterate over the tiers tuple and the sorted dictionary and store the matching tiers items to a dictionary - new_dict
I have used a visited set to avoid revisiting the tiers.

EDIT

You don't need to use a set. Just a break would do. Based on @Xitiz comment.

Here is the Code:

my_dict = {
    'LC08_L1TP_200029_20210716_20210721_02_T1': {
        'cc': 30.57,
        'tier': 'T1',
    },
    'LC08_L1TP_200029_20210716_20210721_02_RT': {
        'cc': 30.57,
        'tier': 'RT',
    },
    'LC08_L1TP_200029_20210630_20210708_02_T2': {
        'cc': 60.52,
        'tier': 'T2',
    },
    'LC08_L1TP_200029_20210630_20210708_02_RT': {
        'cc': 60.52,
        'tier': 'RT',
    }
}
tiers = ('T1', 'RT', 'T2')  # this is the tier order

# Sorting the dict based on 'cc' in descending order
my_dict = dict(sorted(my_dict.items(), key=lambda x: -x[1]['cc']))
new_dict = {}

for i in tiers:
    for k,v in my_dict.items():
        if v['tier'] == i:
            new_dict.update({k: v})
            break
            
print(new_dict)

Output:

{
{
 'LC08_L1TP_200029_20210716_20210721_02_T1': {
    'cc': 30.57, 
    'tier': 'T1'
}, 
 'LC08_L1TP_200029_20210630_20210708_02_RT': {
    'cc': 60.52, 
    'tier': 'RT'
}, 
 'LC08_L1TP_200029_20210630_20210708_02_T2': {
    'cc': 60.52, 
    'tier': 'T2'
}
}

edited Aug 03 '21 at 07:46

answered Aug 03 '21 at 07:37

Ram

4,724
2
14
22

Actually I believe we can just use break under `new_dict.update({k: v});break` so that we don't have to create a new a new set`(visited)` – imxitiz Aug 03 '21 at 07:40
Almost... `RT` is superior to `T2`, therefore `'LC08_L1TP_200029_20210630_20210708_02_T2'` should be removed as well. But no, the `cc` is not related to the tier itself. "Best" is purely defined by the order of `tiers`. – s6hebern Aug 03 '21 at 07:42
@Xitiz Yes! It didn't occur to me. Thanks. – Ram Aug 03 '21 at 07:43
@s6hebern Did you mention anything about this "superiority" in the question ? – Ram Aug 03 '21 at 07:44
@Ram I need the dictionary filtered by the tier levels, that was the question. Not based on the "cc" entry. Should have removed it from the MWE to avoid confusion. – s6hebern Aug 03 '21 at 07:48
@Ram I think OP had done that by saying _"Now I want to filter the dictionary to contain only the "best" available tier."_ – imxitiz Aug 03 '21 at 07:50

Python - Filter dictionary by specific filter tuple

4 Answers4

Edited with keeping only best of each tiers