1

I have a dataset with precomputed means and standard deviations. The values depend on three different categorical values. I would like to create two barplots to split the first categorical variable across them. The other two categorical values should be separated on the x-axis and by using different colors.

In seaborn terms, I want to create barplots with seaborn.catplot based on categorical x and accustoming order as well as hue and col arguments while being able to add my own custom standard deviations.

The following code gives the means of the barplots pretty straightforward:

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")

tip_sumstats = (tips.groupby(["day", "sex", "smoker"])
                     .total_bill
                     .agg(["mean", 'sem'])
                     .reset_index())

sns.catplot(
    data=tip_sumstats,
    x="day",
    order=["Sun", "Thur", "Fri", "Sat"],
    y="mean",
    hue="smoker",
    col="sex",
    kind="bar",
    height=4,
)

enter image description here

This answer solves the problem when hue and order are not involved. However, in the above case, using

def errplot(x, y, yerr, **kwargs):
    ax = plt.gca()
    data = kwargs.pop("data")
    data.plot(x=x, y=y, yerr=yerr, kind="bar", ax=ax, **kwargs)

g = sns.FacetGrid(tip_sumstats, col="sex", hue="smoker", height=4)
g.map_dataframe(errplot, "day", "mean", "sem")

results in enter image description here

I do not understand how to modify this version such that it respects the categorical order on the x-axis defined by some order argument. Furthermore, I do not understand how to add a dodge=True to it such that the differently colored bars appear next to each other.

This question tries to solve something similar. However, the approach is very technical and not straightforward at all. To me, it seems weird that no straightforward solution exists.

JohanC
  • 71,591
  • 8
  • 33
  • 66

1 Answers1

1

Seaborn doesn't support this out-of-the-box, probably because the many options for error bars are complicated to fit with the ways parameters are passed around.

For your specific situation, you could calculate the positions as follows:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

tips = sns.load_dataset("tips")
tip_sumstats = (tips.groupby(["day", "sex", "smoker"])
                .total_bill
                .agg(["mean", 'sem'])
                .reset_index())


def errplot(x, y, data, order, hue, yerr, palette='deep', color=None):
    xs = np.arange(len(order))
    hues = data[hue].unique()
    dodge_width = 0.8
    dodge_vals = np.linspace(-dodge_width / 2, dodge_width / 2, len(hues)*2+1)[1::2]
    colors = sns.color_palette(palette, len(hues))
    for hue_val, dodge_val, color in zip(hues, dodge_vals, colors):
        ys = [data[(data[x] == xi) & (data[hue] == hue_val)][y].to_numpy()[0] for xi in order]
        yerrs = [data[(data[x] == xi) & (data[hue] == hue_val)][yerr].to_numpy()[0] for xi in order]
        plt.bar(x=xs + dodge_val, height=ys, yerr=yerrs, width=dodge_width / len(hues), color=color, label=hue_val)
    plt.xticks(xs, order)


g = sns.FacetGrid(tip_sumstats, col="sex", height=4)
g.map_dataframe(errplot, "day", "mean", hue="smoker", yerr="sem", order=["Sun", "Thur", "Fri", "Sat"])
g.fig.legend(*g.axes.flat[-1].get_legend_handles_labels(), title='smoker')
plt.show()

sns.catplot with custom error bars

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • Is `dodge_width=0.8` the default parameter seaborn uses? Is there an informative website to find out which default parameters seaborn uses in such cases? – Niklas Netter Mar 03 '23 at 13:24
  • The [sns.barplot docs](https://seaborn.pydata.org/generated/seaborn.barplot.html) give a default `width=0.8`. ( [sns.catplot docs](https://seaborn.pydata.org/generated/seaborn.catplot.html) refers to `barplot` for the specific keywords). Other than that, the source can be found at github. – JohanC Mar 03 '23 at 13:43