From a list of dictionaries, get all dictionaries where key column has highest severity values

Question

Given a list of dictionaries as the one below:

dictionaries = [
    {'column': 'NRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'TRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'NRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'TRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 4, 'threshold': 0.25}
]

I want a final list of dictionaries with the highest 'severity' value for each 'column' key. For example:

The output from the above should be:

output = [{'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25}, 
          {'column': 'TRX_TOTAL', 'severity': 4, 'threshold': 0.25}]

Because for NRX_TOTAL column the highest 'severity' is 3, and for TRX_Total column it is 4.

Below there is a code snippet, which does the job. Any ideas on how can be improved?

l_measure_thresholds_with_highest_severity_temp = []
l_disctinct_column_value = []

for l_dict in dictionaries:
    l_temp_dict = {'column': '', 'severity': 0, 'threshold': 0}
    x = l_dict['column']
    if x not in l_disctinct_column_value:
        l_disctinct_column_value.append(x)
        l_temp_dict['column'] = x
        l_measure_thresholds_with_highest_severity_temp.append(l_temp_dict)

l_measure_thresholds_with_highest_severity = list()

for i in l_measure_thresholds_with_highest_severity_temp:
    l_temp_dict = {'column': '', 'severity': 0, 'threshold': 0}
    dict_i_col = i['column']
    dict_i_sev = i['severity']
    dict_i_threshold = i['threshold']

    for j in dictionaries:   
        dict_j_col = j['column']
        dict_j_sev = j['severity']
        dict_j_threshold = j['threshold']
        if dict_i_col == dict_j_col:
            if dict_i_sev < dict_j_sev:
                l_highest_severity = dict_j_sev
                l_highest_threshold = dict_j_threshold
            else:
                l_highest_severity = dict_i_sev
                l_highest_threshold = dict_i_threshold
            l_temp_dict['column'] = dict_i_col
            l_temp_dict['severity'] = l_highest_severity
            l_temp_dict['threshold'] = l_highest_threshold
    l_measure_thresholds_with_highest_severity.append(l_temp_dict)
    
print(l_measure_thresholds_with_highest_severity)

why is it not the correct way in your opinion? what is missing? — Gulzar, Oct 29 '20 at 09:14
@Gulzar , I think there must be a better /more pythonic way to do this . — Atif, Oct 29 '20 at 09:15

Tomerikoo · Answer 1 · 2020-10-29T10:04:09.387

You can create a new mapping between the 'column' and the dict with the max severity for that column. To achieve that, it is helpful to use a defaultdict for the first comparison:

from collections import defaultdict

severities = defaultdict(lambda: {'severity': 0})

for d in dictionaries:
    column = d['column']
    if d['severity'] > severities[column]['severity']:
        severities[column] = d

print(list(severities.values()))

The defaultdict is used for the first comparison to create a "dummy" dict with severity 0. Then, whenever a dict with the same column with a greater severity is found, it is saved in the severities dict. In the end, we just print the values which are the original dicts from the list.

On your list of dictionaries, the above will give:

[{'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25}, 
 {'column': 'TRX_TOTAL', 'severity': 4, 'threshold': 0.25}]

score 1 · Answer 2 · answered Oct 29 '20 at 09:31

this should do the thing:

a = [
    {'column': 'NRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'TRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'NRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'TRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25}
]

list_ = []
high_num = 0
for item in a:
    if item['severity'] > high_num:
        high_num = item['severity']
for item in a:
    if item['severity'] == high_num:
        list_.append(item)

print(list_)

score 1 · Answer 3 · answered Oct 29 '20 at 09:57

Here is a generic function that you could use:

a = [
    {'column': 'NRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'TRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'NRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'TRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25}
]


def max_group_by(lst, group, value):
    '''
    calculatie max values by group key within dict d
    '''
    result = []
    groups = []
    for d in lst:
        g = d.get(group)
        if g and g not in groups:
            v = d.get(value)
            groups.append(g)
            glist = [d2 for d2 in lst if d2.get(group) == g]
            maxval = max(glist, key=lambda x: x.get(value))
            result.append(maxval)
    return result

print(max_group_by(a, 'column', 'severity'))
# [{'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25}, 
#  {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25}]

score 1 · Answer 4 · answered Oct 29 '20 at 11:10

A pandas solution:

> pip install pandas

import pandas as pd

df = pd.DataFrame(dictionaries)

max_per_column = df.groupby(["column"], sort=False)["severity"].transform(max)
ind = max_per_column == df["severity"]
result = df[ind]

this is as simple as your problem definition:

group by column
find max severity per group
get all rows with those max values
profit

score 0 · Answer 5 · answered Oct 29 '20 at 09:15

0

You can use sorting the list and getting the last value:

b = sorted(a,key=lambda x: x['severity'])[-1]
print(b)

answered Oct 29 '20 at 09:15

Wasif

14,755
3
14
34

2

This doesn't give the expected output – Tomerikoo Oct 29 '20 at 09:17
this is O(n log n) which is not needed – Gulzar Oct 29 '20 at 09:19

Matiiss · Answer 6 · 2020-10-29T09:17:22.237

0

I gues you can sort like this:

a = [
    {'column': 'NRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'TRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'NRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'TRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25}
]

def key(elem):
    return elem['severity']

a.sort(key=key, reverse=True)
print(a)

look up what sort() does (it sorts by key)

edited Oct 29 '20 at 09:17

answered Oct 29 '20 at 09:16

Matiiss

5,970
2
12
29

this is O(n log n) which is not needed – Gulzar Oct 29 '20 at 09:19
1

No... the expected output is `[{'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25}, {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25}]` – Tomerikoo Oct 29 '20 at 09:19

score 0 · Answer 7 · answered Oct 29 '20 at 09:26

You can try:

a = [
    {'column': 'NRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'TRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'NRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'TRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25}
]

output = []
severity = 0

for data in a:
    if data["severity"] > severity:
        output = []
        severity = data["severity"]
        output.append(data)
    elif data["severity"] == severity:
        output.append(data)
print(output)

Output:

[{'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25}, {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25}]

Henry Tjhia · Answer 8 · 2020-10-29T11:22:48.863

Following not the best solution (see Tomerikoo's for better alternative):

a = [  {'column': 'NRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'TRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'NRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'TRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 4, 'threshold': 0.25}
]

NRX_TOTAL = 0
NRX_DICT = None
TRX_TOTAL = 0
TRX_DICT = None

for i in a:
    if 'NRX_TOTAL' in i.values():
        if i['severity'] > NRX_TOTAL:
            NRX_TOTAL = i['severity']
            NRX_DICT = i
    else:
        if i['severity'] > TRX_TOTAL:
            TRX_TOTAL = i['severity']
            TRX_DICT = i

print([NRX_DICT, TRX_DICT])

Alternative:

d = set(map(lambda x: x['column'], a))
l = []

for i in d:
    l.append(max(filter(lambda x: i in x.values(), a), key=lambda x: x['severity']))

print(l)

Output:

[{'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25}, 
{'column': 'TRX_TOTAL', 'severity': 4, 'threshold': 0.25}]

From a list of dictionaries, get all dictionaries where key column has highest severity values

8 Answers8