Add error bars with customized upper and lower bounds to a bar plot in python

Question

I want to add HDI (High Density Intervals) that I computed (columns hdi_both, hdi_one, and lower_upper in the df below) to the bar plot.

However, I cannot figure out how to add error bars/CI such that each error bar has a customized upper and lower bounds that are independent from the y value (in this case the corresponding value in proportion_correct).

For example, the HDI interval for Exp. 1 with guesses_correct both has lower bound of 0.000000 and upper bound of 0.130435 and the proportion_correct is 0.000000.

All the options I saw include specifying upper and lower bounds relative to the value on the y axis, which is not what I'm looking for.

Any help will be greatly appreciated.

Thanks,

Ayala

import os
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame({
 'exp': ['Exp. 1', 'Exp. 1', 'Exp. 2', 'Exp. 2', 'Exp. 3', 'Exp. 3', 'Exp. 4', 'Exp. 4', 'Exp. 5', 'Exp. 5',
 'Collapsed', 'Collapsed'],
 'proportion_correct': [0.0, 0.304347826, 0.058823529000000006, 0.31372549, 0.047619048, 0.333333333, 0.12244898, 0.428571429, 0.12244898, 0.367346939, 0.082901554, 0.35751295299999997],
 'guesses_correct': ['both', 'one', 'both', 'one', 'both', 'one', 'both', 'one', 'both', 'one', 'both', 'one'],
 'hdi_both': [0.0, 0.130434783, 0.0, 0.078431373, 0.0, 0.1, 0.0, 0.08, 0.0, 0.081632653, 0.005181347, 0.051813472],
 'hdi_one': [0.130434783, 0.47826087, 0.156862745, 0.41176470600000004, 0.1, 0.5, 0.16, 0.4, 0.163265306, 0.408163265, 0.21761658, 0.341968912],
 'lower_upper': ['lower', 'upper', 'lower', 'upper', 'lower', 'upper', 'lower', 'upper', 'lower', 'upper', 'lower', 'upper']
})

print(df.head())
Out[4]: 
      exp  proportion_correct guesses_correct  hdi_both   hdi_one lower_upper
0  Exp. 1            0.000000            both  0.000000  0.130435       lower
1  Exp. 1            0.304348             one  0.130435  0.478261       upper
2  Exp. 2            0.058824            both  0.000000  0.156863       lower
3  Exp. 2            0.313725             one  0.078431  0.411765       upper
4  Exp. 3            0.047619            both  0.000000  0.100000       lower

# Make bar plot
sns.barplot(x='exp',
            y='proportion_correct',
            hue='guesses_correct',
            data=df)

plt.ylim([0, 0.5])
plt.xlabel('Experiment')
plt.ylabel('Proportion Correct')
plt.legend(title='Correct guesses', loc='upper right')
plt.axhline(y=0.277777, color='dimgray', linestyle='--')
plt.annotate(' chance\n one', (5.5, 0.27))
plt.axhline(y=0.02777, color='dimgray', linestyle='--')
plt.annotate(' chance\n both', (5.5, 0.02))
# Show the plot
plt.show()

This is the bar plot for which I want to add the HDI

score 1 · Answer 1 · answered Jan 09 '21 at 13:29

I ended up plotting vertical lines as error bars. Here is my code in case it will help someone.

df = pd.DataFrame({'exp': ['Exp. 1', 'Exp. 1', 'Exp. 2', 'Exp. 2', 'Exp. 3', 'Exp. 3', 'Exp. 4', 'Exp. 4', 'Exp. 5', 'Exp. 5', 'Collapsed', 'Collapsed'],
                   'proportion_correct': [0.0, 0.304347826, 0.058823529000000006, 0.31372549, 0.047619048, 0.333333333, 0.12244898, 0.428571429, 0.12244898, 0.367346939, 0.082901554, 0.35751295299999997],
                   'guesses_correct': ['both', 'one', 'both', 'one', 'both', 'one', 'both', 'one', 'both', 'one', 'both', 'one'], 
                   'hdi_low': [0.0, 0.130434783, 0.0, 0.156862745, 0.0, 0.1, 0.0, 0.16, 0.0, 0.163265306, 0.005181347, 0.21761658],
                   'hdi_high': [0.130434783, 0.47826087, 0.078431373, 0.41176470600000004, 0.1, 0.5, 0.08, 0.4, 0.081632653, 0.408163265, 0.051813472, 0.341968912]
                  })
df.head()
Out[4]: 
  exp  proportion_correct guesses_correct   hdi_low  hdi_high
0  Exp. 1            0.000000            both  0.000000  0.130435
1  Exp. 1            0.304348             one  0.130435  0.478261
2  Exp. 2            0.058824            both  0.000000  0.078431
3  Exp. 2            0.313725             one  0.156863  0.411765
4  Exp. 3            0.047619            both  0.000000  0.100000

The following axvlines and axhlines functions were taken from How to draw vertical lines on a given plot in matplotlib. I don't write them here for clarity.

    # Make bar plot
    x_col = 'exp'
    y_col = 'proportion_correct'
    hue_col = 'guesses_correct'
    low_col = 'hdi_low'
    high_col = 'hdi_high'
    plot = sns.barplot(x=x_col,
                y=y_col,
                hue=hue_col,
                data=df)
    plt.ylim([0, 0.55])
    plt.yticks([0, 0.1, 0.2, 0.3, 0.4, 0.5], [0, 0.1, 0.2, 0.3, 0.4, 0.5])
    plt.xlabel('Experiment')
    plt.ylabel('Proportion Correct')
    plt.legend(title='Correct guesses', loc='upper right')
    plt.axhline(y=0.277777, color='dimgray', linestyle='--')
    plt.annotate(' chance\n one', (5.65, 0.27))
    plt.axhline(y=0.02777, color='dimgray', linestyle='--')
    plt.annotate(' chance\n both', (5.65, 0.02))
    lims_x = list(map(lambda x, y: (x, y), df[low_col].to_list(), df[high_col].to_list()))
    xss = [-0.2, 0.2, 0.8, 1.2, 1.8, 2.2, 2.8, 3.2, 3.8, 4.2, 4.8, 5.2]
    yss = [i for sub in lims_x for i in sub]
    lims_y = [(-0.3, -0.1), (-0.3, -0.1), (0.1, 0.3), (0.1, 0.3), (0.7, 0.9), (0.7, 0.9), (1.1, 1.3), (1.1, 1.3),
              (1.7, 1.9), (1.7, 1.9), (2.1, 2.3), (2.1, 2.3), (2.7, 2.9), (2.7, 2.9), (3.1, 3.3),  (3.1, 3.3),
              (3.7, 3.9), (3.7, 3.9), (4.1, 4.3), (4.1, 4.3), (4.7, 4.9), (4.7, 4.9), (5.1, 5.3), (5.1, 5.3)]
    for xs, lim in zip(xss, lims_x):
        plot = axvlines(xs, lims=lim, color='black')
    for yx, lim in zip(yss, lims_y):
        plot = axhlines(yx, lims=lim, color='black')
    plt.show()

And this is the plot

score 0 · Answer 2 · answered Jan 01 '21 at 13:11

0

Although you have calculated the lower and upper bounds of your errorbars in absolute value, they are generally considered to be lower and upper errors around a particular y-value. But it's easy to calculate the "relative" lengths of the error bars by subtracting the y-value from the bounds you calculated.

You can then use plt.errorbar() to plot. Note that to use this function, all error values must be positive.

Since you are using a hue= split, you have to iterate through the different levels of hue, and take into account the shift of the bars (by default -0.2 and +0.2 for two levels of hue):

# Make bar plot
x_col = 'exp'
y_col = 'proportion_correct'
hue_col = 'guesses_correct'
low_col = 'hdi_both'
high_col = 'hdi_one'
sns.barplot(x=x_col,
            y=y_col,
            hue=hue_col,
            data=df)

for (h,g),pos in zip(df.groupby(hue_col),[-0.2,0.2]):
    err = g[[low_col, high_col]].subtract(g[y_col], axis=0).abs().T.values
    x = np.arange(len(g[x_col].unique()))+pos
    plt.errorbar(x=x, y=g[y_col], yerr=err, fmt='none', capsize=5, ecolor='k')

answered Jan 01 '21 at 13:11

Diziet Asahi

38,379
7
60
75

Hi @Diziet Asahi, I think there is some mistake in calculating the error bars such that they will correspond to the absolute values of the error bars. For example in Exp. 4, the lower and upper bounds for `proportion_correct` `both` `0.12244898` are `0` and `0.08`. However, from the graph you generated it looks like the lower bound is `0` which is correct but the upper bound in the graph is `~0.18` which is not correct. Same goes for other error bars in the graphs. So something in your code doesn't look right to me but I am still trying to understand what. Thanks! – ayalaall Jan 02 '21 at 08:02
1

I think I misunderstood the format of your dataframe. I was under the impression that the bounds for `Exp. 4` `both` were `[0.000000,0.160000]` (the values on that same row). But you are saying they should be `[0.000000,0.080000]` (the values in the two successive rows in column `hdi_both`)? – Diziet Asahi Jan 02 '21 at 09:39
1

If you created this dataframe yourself from some raw data, it would make much more sense (in my mind at least) to have each row correspond to one condition (`Exp. N`,`both/one`,`proportion`,`hdi_low`,`hdi_high`), with all the values that are related to that condition (including conf. interval bounds) on the same row. – Diziet Asahi Jan 02 '21 at 09:41
Hi @Diziet Asahi. I tried that but it still not showing the error bars properly. I tried editing my question to explain why but the edit was not approved. – ayalaall Jan 09 '21 at 11:36

Add error bars with customized upper and lower bounds to a bar plot in python

2 Answers2