1

I have a dataframe with positive and negative values from three kind of variables.

    labels  variable    value
0   -10e5        nat     -38
1     2e5        nat      50
2    10e5        nat      16
3   -10e5        agr     -24
4     2e5        agr      35
5    10e5        agr      26
6   -10e5        art     -11
7     2e5        art      43
8    10e5        art      20

when values are negative I want the barplot to follow the color sequence:

n_palette = ["#ff0000","#ff0000","#00ff00"]

Instead when positive I want it to reverse the palette:

p_palette = ["#00ff00","#00ff00","#ff0000"]

I've tried this:

palette = ["#ff0000","#ff0000","#00ff00",
           "#00ff00","#00ff00","#ff00",
           "#00ff00","#00ff00","#ff00"]

ax = sns.barplot(x=melted['labels'], y=melted['value'], hue = melted['variable'],
                 linewidth=1,
                 palette=palette)

But I get the following output:

enter image description here

what I'd like is the first two bars of the group to become green and the last one red when values are positive.

Rodrigo Vargas
  • 273
  • 3
  • 17

1 Answers1

2

You seem to want to do the coloring depending on a criterion on two columns. It seems suitable to add a new column which uniquely labels that criterion.

Further, seaborn allows the palette to be a dictionary telling exactly which hue label gets which color. Adding barplot(..., order=[...]) would define a fixed order.

Here is some example code:

from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
from io import StringIO

data_str = '''    labels  variable    value
0   -10e5        nat     -38
1     2e5        nat      50
2    10e5        nat      16
3   -10e5        agr     -24
4     2e5        agr      35
5    10e5        agr      26
6   -10e5        art     -11
7     2e5        art      43
8    10e5        art      20
'''
melted = pd.read_csv(StringIO(data_str), delim_whitespace=True, dtype={'labels': str})
melted['legend'] = np.where(melted['value'] < 0, '-', '+')
melted['legend'] = melted['variable'] + melted['legend']
palette = {'nat-': "#ff0000", 'agr-': "#ff0000", 'art-': "#00ff00",
           'nat+': "#00ff00", 'agr+': "#00ff00", 'art+': "#ff0000"}

ax = sns.barplot(x=melted['labels'], y=melted['value'], hue=melted['legend'],
                 linewidth=1, palette=palette)
ax.axhline(0, color='black')
plt.show()

sns.barplot with coloring depending on positiveness

PS: To remove the legend: ax.legend_.remove(). Or to have a legend with multiple columns: ax.legend(ncol=3).

A different approach, directly with the original dataframe, is to create two bar plots: one for the negative values and one for the positive. For this to work well, it is necessary that the 'labels' column (the x=) is explicitly made categorical. Also adding pd.Categorical(..., categories=['nat', 'agr', 'art']) for the 'variable' column could fix an order.

This will generate a legend with the labels twice with different colors. Depending on what you want, you can remove it or create a more custom legend. An idea is to add the labels under the positive and on top of the negative bars:

sns.set()
melted = pd.read_csv(StringIO(data_str), delim_whitespace=True, dtype={'labels': str})
palette_pos = {'nat': "#00ff00", 'agr': "#00ff00", 'art': "#ff0000"}
palette_neg = {'nat': "#ff0000", 'agr': "#ff0000", 'art': "#00ff00"}
melted['labels'] = pd.Categorical(melted['labels'])
ax = sns.barplot(data=melted[melted['value'] < 0], x='labels', y='value', hue='variable',
                 linewidth=1, palette=palette_neg)
sns.barplot(data=melted[melted['value'] >= 0], x='labels', y='value', hue='variable',
            linewidth=1, palette=palette_pos, ax=ax)
ax.legend_.remove()
ax.axhline(0, color='black')
ax.set_xlabel('')
ax.set_ylabel('')
for bar_container in ax.containers:
    label = bar_container.get_label()
    for p in bar_container:
        x = p.get_x() + p.get_width() / 2
        h = p.get_height()
        if not np.isnan(h):
            ax.text(x, 0, label + '\n\n' if h < 0 else '\n\n' + label, ha='center', va='center')
plt.show()

drawing sns.barplot twice

Still another option involves sns.catplot() which could be clearer when a lot of data is involved:

sns.set()
melted = pd.read_csv(StringIO(data_str), delim_whitespace=True, dtype={'labels': str})
melted['legend'] = np.where(melted['value'] < 0, '-', '+')
melted['legend'] = melted['variable'] + melted['legend']
palette = {'nat-': "#ff0000", 'agr-': "#ff0000", 'art-': "#00ff00",
           'nat+': "#00ff00", 'agr+': "#00ff00", 'art+': "#ff0000"}
g = sns.catplot(kind='bar', data=melted, col='labels', y='value', x='legend',
                 linewidth=1, palette=palette, sharex=False, sharey=True)
for ax in g.axes.flat:
    ax.axhline(0, color='black')
    ax.set_xlabel('')
    ax.set_ylabel('')
plt.show()

catplot with di

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • The first output is right but the bars seems to be displaced, I mean they aren't placed regularly in my image. – Rodrigo Vargas Apr 01 '21 at 18:10
  • Indeed, the first approach creates 6 hue categories and the plot makes room to place all 6 of them for each X-label. That's why I also added the second approach, although it goes somewhat deeper into internal matplotlib structures. Did you expect something very different? – JohanC Apr 01 '21 at 18:18
  • Not much more different but I'm struggling to make the barplot regular. In the second opotion the categorical order mismatch the numerical one -my actual data frame is a bit longer-, but I guess there's a way to set the order back. Anyway the two options go closer than I could arrive. Thanks – Rodrigo Vargas Apr 01 '21 at 18:22
  • `barplot()` accepts a parameter `order=` where you can give a list of `['nat', 'agr', 'art']` to define an order. Similarly, `pd.Categorical(..., categories=['nat', 'agr', 'art'])` will fix an order. Still another approach is to use `sns.catplot(...)` to create one subplot for each x-label. – JohanC Apr 01 '21 at 18:29
  • Now it's done! That's exactly where I wanted to go. – Rodrigo Vargas Apr 01 '21 at 18:32