Python Pandas - Highlighting maximum value in column

Question

I have a dataframe produced by this code:

hmdf = pd.DataFrame(hm01)
new_hm02 = hmdf[['FinancialYear','Month']]
new_hm01 = hmdf[['FinancialYear','Month','FirstReceivedDate']]

hm05 = new_hm01.pivot_table(index=['FinancialYear','Month'], aggfunc='count')
vals1 = ['April    ', 'May      ', 'June     ', 'July     ', 'August   ', 'September', 'October  ', 'November ', 'December ', 'January  ', 'February ', 'March    ']

df_hm = new_hm01.groupby(['Month', 'FinancialYear']).size().unstack(fill_value=0).rename(columns=lambda x: '{}'.format(x))
df_hml = df_hm.reindex(vals1)

And then I have a function to highlight the maximum value in each column:

def highlight_max(data, color='yellow'):
    '''
    highlight the maximum in a Series or DataFrame
    '''
    attr = 'background-color: {}'.format(color)
    if data.ndim == 1:  # Series from .apply(axis=0) or axis=1
        is_max = data == data.max()
        return [attr if v else '' for v in is_max]
    else:  # from .apply(axis=None)
        is_max = data == data.max().max()
        return pd.DataFrame(np.where(is_max, attr, ''),
                            index=data.index, columns=data.columns)

And then this code: dfPercent.style.apply(highlight_max) produces this:

As you can see, only the first and last column have the correct max value highlighted.

Anyone know what is going wrong?

Thank you

Hey, getting an error of: AttributeError: ("'str' object has no attribute 'ndim'", u'occurred at index 2014/2015')? — Nicholas, Aug 10 '17 at 06:59
It seems to be applying the function on the column names too... — cs95, Aug 10 '17 at 07:00
Ahhh. Maybe I did something wrong converting the dataframe. Thank you, I will take a look :) — Nicholas, Aug 10 '17 at 07:01

score 19 · Accepted Answer · edited Jul 22 '20 at 17:11

19

There is problem you need convert values to floats for correct max, because get max value of strings - 9 is more as 1:

def highlight_max(data, color='yellow'):
    '''
    highlight the maximum in a Series or DataFrame
    '''
    attr = 'background-color: {}'.format(color)
    #remove % and cast to float
    data = data.replace('%','', regex=True).astype(float)
    if data.ndim == 1:  # Series from .apply(axis=0) or axis=1
        is_max = data == data.max()
        return [attr if v else '' for v in is_max]
    else:  # from .apply(axis=None)
        is_max = data == data.max().max()
        return pd.DataFrame(np.where(is_max, attr, ''),
                            index=data.index, columns=data.columns)

Sample:

dfPercent = pd.DataFrame({'2014/2015':['10.3%','9.7%','9.2%'],
                   '2015/2016':['4.8%','100.8%','9.7%']})
print (dfPercent)
  2014/2015 2015/2016
0     10.3%      4.8%
1      9.7%    100.8%
2      9.2%      9.7%

Command:

dfPercent.style.apply(highlight_max)

edited Jul 22 '20 at 17:11

David Beauchemin

231
1
2
12

answered Aug 10 '17 at 06:58

jezrael

822,522
95
1,334
1,252

Hey jezrael. Thank you. I am getting a 'ValueError: ('invalid literal for float(): 9.2%', u'occurred at index 2014/2015')' error? – Nicholas Aug 10 '17 at 07:00
1

Give me some time, I have to test it. – jezrael Aug 10 '17 at 07:01
Thank you! I really appreciate it :) – Nicholas Aug 10 '17 at 07:02
1

I think you need `regex=True` `data = data.replace('%','', regex=True).astype(float))` – jezrael Aug 10 '17 at 07:08
Very sorry Jezrael, where do I put this? – Nicholas Aug 10 '17 at 07:12
1

Check 5. row, I only add `regex=True` to replace – jezrael Aug 10 '17 at 07:12
Jezrael! THANK YOU. You have done it again. Works perfectly, super happy. If you have a second, could you explain what the issue was for my learning? :) – Nicholas Aug 10 '17 at 07:14
1

Main issue is you want max value from strings, not floats. So it comapre lexicografically. So if need max values form numeric, is necessary converting to floats first - (there is a lot of solutions, I use replace, another possible solution is `data.applymap(lambda x: float(x[:-1]))` ) – jezrael Aug 10 '17 at 07:22
1

And in style function is possible it without change original data, because `highlight_max` return another dataframe with `strings`, test it by `print (dfPercent.apply(highlight_max))` – jezrael Aug 10 '17 at 07:23
1

Ahhh, I understand, thank you. Also, I really appreciate the example you added, that will help anyone in the future stumbling across this page :) – Nicholas Aug 10 '17 at 07:35
1

Want to highlight max value in a row (across columns), specify axis=1: `dfPercent.style.apply(highlight_max,axis=1)` – BSalita Jun 29 '20 at 13:04

score 15 · Answer 2 · answered Dec 21 '20 at 04:48

15

If you are using Python 3 this should easily do the trick

dfPercent.style.highlight_max(color = 'yellow', axis = 0)

answered Dec 21 '20 at 04:48

Caio Franco

166
1
2

score 4 · Answer 3 · answered Jun 29 '20 at 14:09

Variation highlighting max value column-wise (axis=1) using two colors. One color highlights duplicate max values. The other color highlights only the last column containing the max value.

def highlight_last_max(data, colormax='antiquewhite', colormaxlast='lightgreen'):
    colormax_attr = f'background-color: {colormax}'
    colormaxlast_attr = f'background-color: {colormaxlast}'
    max_value = data.max()
    is_max = [colormax_attr if v == max_value else '' for v in data]
    is_max[len(data) - list(reversed(data)).index(max_value) -  1] = colormaxlast_attr
    return is_max

df.style.apply(highlight_last_max,axis=1)

score 1 · Answer 4 · answered Nov 25 '21 at 14:13

Easy way to color max, min, or null values in pandas.DataFrame is to uses style of pandas.DataFrame.style, which Contains methods for building a styled HTML representation of the DataFrame. Here is an example:

Color Max Values: your_df.style.highlight_max(color = 'green')
Color Min Values: your_df.style.highlight_min(color = 'red')
Color Null values: your_df.highlight_null(color = 'yellow')
If you want to apply all in the same output:
your_df.style.highlight_max(color='green').highlight_min(color='red').highlight_null(null_color='yellow')

Python Pandas - Highlighting maximum value in column

4 Answers4

Linked