Color an entire DataFrame based on distinct values in a particular column

Question

This question is a continuation to my previously asked question (Color a Pandas DataFrame column based on distinct values)

So basically, i want to highlight the entire rows of a DataFrame, based on the distinct values in a particular column (In this case, its the "Country" column)

Here is how i am trying to achieve this

d1 = pd.DataFrame({"Country":['xx','xx','xy','xz','xz'],
               "year":['y1','y2','y1','y1','y2'],
               "population":[100,200,120,140,190]})


import matplotlib

colors = dict(zip(d1['Country'].unique(),
              (f'background-color: {c}' for c in matplotlib.colors.cnames.values())))


def test_check(df):
    for key in colors:
        # print(key)
        if key in df['Country']:
            return colors.get(key)
        else:
            pass
        
    
d2 = d1.style.apply(test_check, axis=1)

but i end up with this error

ValueError: Function <function test_check at 0x0000024D090E98B0> returned the wrong shape. Result has shape: (5,) Expected shape: (5, 3)

I want my output to be something like this. What is the best way to do this?

niraj · Accepted Answer · 2022-03-09T14:56:07.073

The error says, it expects (5,3) here there are 5 rows and 3 columns but it only gets 5 elements. It is expecting color for each column in a row which causes the error.

If you follow something like the other answer in stackoverflow, I think it should work. All you need is pd.Series(colors[row['Country']], row.index) in your test_check function. You can try the following:

import pandas as pd
import matplotlib

d1 = pd.DataFrame({"Country":['xx','xx','xy','xz','xz'],
               "year":['y1','y2','y1','y1','y2'],
               "population":[100,200,120,140,190]})



colors = dict(zip(d1['Country'].unique(),
              (f'background-color: {c}' for c in matplotlib.colors.cnames.values())))


def test_check(row):
    
    return pd.Series(colors[row['Country']], row.index)
    
d1.style.apply(test_check, axis=1)

Adding more explanation:

If you try colors[d1.iloc[0]['Country']] i.e. colors[row['Country']] where row is first row in the dataframe it returns value from dictionary something like: 'background-color: #F0F8FF' And passing test_check(d1.iloc[0]) it returns same color for all three columns of the row i.e. as below for first row:

Country       background-color: #F0F8FF
year          background-color: #F0F8FF
population    background-color: #F0F8FF

That way for each row gets same color value for the three columns which is applied via Style. You can see what values are returned by d1.apply(test_check, axis=1)

Even though it works ,i do not understand what's happening here. What does colors[row['Country']] return? — Whiteflames, Mar 09 '22 at 14:48
I added few explanation, you can try running and see it is returning 3 values i.e. color for each column (3 here) instead of 1 which was causing error. — niraj, Mar 09 '22 at 14:57

Color an entire DataFrame based on distinct values in a particular column

1 Answers1