-1

Given a DataFrame

  a b c d
1 5 5 5 5
2 5 5 5 5
3 5 5 5 5

I would like to add more columns on the DataFrame based on the existing ones but using some logic that can't fit in a lambda. The desired result should look something like this:

   a  a_added  c  c_added  d  d_added
1  5  'good'   5  'good'   5   'bad'
2  5  'bad'    5  'good'   5   'bad'
3  5  'good'   5  'good'   5   'bad'

After seeing this answer, my idea was to use DataFrame.apply() on each row and after that Series.apply() on each value but I don't know exactly how to chain the calls and what exactly to return such that I return a new column name from the Series's apply function. After that I think I need to combine those two DataFrames with DataFrame.join(). I really need to use Series.apply() because I have to compute each value with some custom logic.

EDIT: I have a map of of thresholds where the keys correspond to the column names in my DataFrame and the values are warning/critical thresholds plus an operation that says how the current value should be compared against the threshold:

thresholds = {
    'a': {'warning': 90, 'critical': 98, operation: 'lt'},
    'b': {'warning': 10, 'critical': 15, operation: 'gt'},
    'c': {'warning': 5, 'critical': 9, operation: 'le'}
}

EDIT2: Using the following input with the thresholds above:

  a   b  c
1 89  0  4
2 91  9  10
3 99 17  5

will get as result:

    a  a_r    b  b_r   c   c_r
1  89  good   0  good   4  good
2  91  warn   9  warn  10  crit
3  99  crit  17  good   5  warn

Therefore for each value depending on the column name I have to apply the corresponding threshold from the map.

cosmarc
  • 562
  • 3
  • 19

1 Answers1

1

Use:

print (df)
    a   b   c
1  89  11   4
2  91   9  10
3  99  17   5


thresholds = {
    'a': {'warning': 90, 'critical': 98, 'operation': 'lt'},
    'b': {'warning': 10, 'critical': 15, 'operation': 'gt'},
    'c': {'warning': 5, 'critical': 9, 'operation': 'le'}
}

import operator

ops = {'gt': operator.gt,
       'lt': operator.lt,
       'ge': operator.ge,
       'le': operator.le,
       'eq': operator.eq,
       'ne': operator.ne}

for k, v in thresholds.items():
    op1 = v.pop('operation')
    if op1 in ('lt','le'):
        sorted_v = sorted(v.items(), key=operator.itemgetter(1))
    else:
        sorted_v = sorted(v.items(), key=operator.itemgetter(1), reverse=True)
    for k1, v1 in sorted_v:

        #https://stackoverflow.com/q/46421521/2901002
        m = ops[op1](v1, df[k])
        df.loc[m, f'{k}_added'] = k1

df = df.sort_index(axis=1).fillna('good')
print (df)
    a   a_added   b   b_added   c   c_added
1  89      good  11  critical   4      good
2  91   warning   9   warning  10  critical
3  99  critical  17      good   5   warning
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks for the answer, I've updated my question since it wasn't enough info. – cosmarc May 11 '19 at 09:38
  • Thanks for your answer! The only problem with your approach is that in the resulting DataFrame will only be the result of the last operation applied. For example a threshold `{ 'warning': 30, 'critical': 35, 'operation': lt }` will always show critical for a value between 30 and 35... so it shows the latest threshold applied – cosmarc May 13 '19 at 09:43
  • @CosminM. - Not easy test it, is possible add some sample data with expected output? – jezrael May 13 '19 at 13:32
  • I've added the expected output. – cosmarc May 13 '19 at 13:50
  • @CosminM. - Answer was changed, also for second column was changed first value to `11` – jezrael May 14 '19 at 12:53