1

Given a list of dictionaries as the one below:

dictionaries = [
    {'column': 'NRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'TRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'NRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'TRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 4, 'threshold': 0.25}
]

I want a final list of dictionaries with the highest 'severity' value for each 'column' key. For example:

The output from the above should be:

output = [{'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25}, 
          {'column': 'TRX_TOTAL', 'severity': 4, 'threshold': 0.25}]

Because for NRX_TOTAL column the highest 'severity' is 3, and for TRX_Total column it is 4.

Below there is a code snippet, which does the job. Any ideas on how can be improved?

l_measure_thresholds_with_highest_severity_temp = []
l_disctinct_column_value = []

for l_dict in dictionaries:
    l_temp_dict = {'column': '', 'severity': 0, 'threshold': 0}
    x = l_dict['column']
    if x not in l_disctinct_column_value:
        l_disctinct_column_value.append(x)
        l_temp_dict['column'] = x
        l_measure_thresholds_with_highest_severity_temp.append(l_temp_dict)

l_measure_thresholds_with_highest_severity = list()

for i in l_measure_thresholds_with_highest_severity_temp:
    l_temp_dict = {'column': '', 'severity': 0, 'threshold': 0}
    dict_i_col = i['column']
    dict_i_sev = i['severity']
    dict_i_threshold = i['threshold']

    for j in dictionaries:   
        dict_j_col = j['column']
        dict_j_sev = j['severity']
        dict_j_threshold = j['threshold']
        if dict_i_col == dict_j_col:
            if dict_i_sev < dict_j_sev:
                l_highest_severity = dict_j_sev
                l_highest_threshold = dict_j_threshold
            else:
                l_highest_severity = dict_i_sev
                l_highest_threshold = dict_i_threshold
            l_temp_dict['column'] = dict_i_col
            l_temp_dict['severity'] = l_highest_severity
            l_temp_dict['threshold'] = l_highest_threshold
    l_measure_thresholds_with_highest_severity.append(l_temp_dict)
    
print(l_measure_thresholds_with_highest_severity)
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Atif
  • 1,012
  • 1
  • 9
  • 23

8 Answers8

2

You can create a new mapping between the 'column' and the dict with the max severity for that column. To achieve that, it is helpful to use a defaultdict for the first comparison:

from collections import defaultdict

severities = defaultdict(lambda: {'severity': 0})

for d in dictionaries:
    column = d['column']
    if d['severity'] > severities[column]['severity']:
        severities[column] = d

print(list(severities.values()))

The defaultdict is used for the first comparison to create a "dummy" dict with severity 0. Then, whenever a dict with the same column with a greater severity is found, it is saved in the severities dict. In the end, we just print the values which are the original dicts from the list.

On your list of dictionaries, the above will give:

[{'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25}, 
 {'column': 'TRX_TOTAL', 'severity': 4, 'threshold': 0.25}]
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
1

this should do the thing:

a = [
    {'column': 'NRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'TRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'NRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'TRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25}
]

list_ = []
high_num = 0
for item in a:
    if item['severity'] > high_num:
        high_num = item['severity']
for item in a:
    if item['severity'] == high_num:
        list_.append(item)

print(list_)
Matiiss
  • 5,970
  • 2
  • 12
  • 29
1

Here is a generic function that you could use:

a = [
    {'column': 'NRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'TRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'NRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'TRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25}
]


def max_group_by(lst, group, value):
    '''
    calculatie max values by group key within dict d
    '''
    result = []
    groups = []
    for d in lst:
        g = d.get(group)
        if g and g not in groups:
            v = d.get(value)
            groups.append(g)
            glist = [d2 for d2 in lst if d2.get(group) == g]
            maxval = max(glist, key=lambda x: x.get(value))
            result.append(maxval)
    return result

print(max_group_by(a, 'column', 'severity'))
# [{'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25}, 
#  {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25}]
Marko
  • 372
  • 2
  • 7
1

A pandas solution:

> pip install pandas

import pandas as pd

df = pd.DataFrame(dictionaries)

max_per_column = df.groupby(["column"], sort=False)["severity"].transform(max)
ind = max_per_column == df["severity"]
result = df[ind]

this is as simple as your problem definition:

  1. group by column
  2. find max severity per group
  3. get all rows with those max values
  4. profit
Gulzar
  • 23,452
  • 27
  • 113
  • 201
0

You can use sorting the list and getting the last value:

b = sorted(a,key=lambda x: x['severity'])[-1]
print(b)
Wasif
  • 14,755
  • 3
  • 14
  • 34
0

I gues you can sort like this:

a = [
    {'column': 'NRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'TRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'NRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'TRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25}
]

def key(elem):
    return elem['severity']

a.sort(key=key, reverse=True)
print(a)

look up what sort() does (it sorts by key)

Matiiss
  • 5,970
  • 2
  • 12
  • 29
0

You can try:

a = [
    {'column': 'NRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'TRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'NRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'TRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25}
]

output = []
severity = 0

for data in a:
    if data["severity"] > severity:
        output = []
        severity = data["severity"]
        output.append(data)
    elif data["severity"] == severity:
        output.append(data)
print(output)

Output:

[{'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25}, {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25}]
Harsha Biyani
  • 7,049
  • 9
  • 37
  • 61
0

Following not the best solution (see Tomerikoo's for better alternative):

a = [  {'column': 'NRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'TRX_TOTAL', 'severity': 1, 'threshold': 0.1},
    {'column': 'NRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'TRX_TOTAL', 'severity': 2, 'threshold': 0.15},
    {'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 3, 'threshold': 0.25},
    {'column': 'TRX_TOTAL', 'severity': 4, 'threshold': 0.25}
]

NRX_TOTAL = 0
NRX_DICT = None
TRX_TOTAL = 0
TRX_DICT = None

for i in a:
    if 'NRX_TOTAL' in i.values():
        if i['severity'] > NRX_TOTAL:
            NRX_TOTAL = i['severity']
            NRX_DICT = i
    else:
        if i['severity'] > TRX_TOTAL:
            TRX_TOTAL = i['severity']
            TRX_DICT = i

print([NRX_DICT, TRX_DICT])

Alternative:

d = set(map(lambda x: x['column'], a))
l = []

for i in d:
    l.append(max(filter(lambda x: i in x.values(), a), key=lambda x: x['severity']))

print(l)

Output:

[{'column': 'NRX_TOTAL', 'severity': 3, 'threshold': 0.25}, 
{'column': 'TRX_TOTAL', 'severity': 4, 'threshold': 0.25}]
Henry Tjhia
  • 742
  • 1
  • 5
  • 11