0

I want to create a bar graph that identifies the number of observations (n) for each bar in my plot.

I have a data frame that looks like this:

Treatment Condition %_time
STZ Stressed 3
Control Stressed 6
STZ Unstressed 2
Control Unstressed 8

I have successfully created a bar plot with the following code and output:

color = (sns.color_palette("Paired"))
sns.set_style(style='white')
fig, ax = plt.subplots(figsize=(9,6))
ax = sns.barplot(x='Treatment',y='% _time_open_arm',  hue = 'Condition', data = df, 
                 capsize= .1, palette=color)
plt.legend(title='Groups', loc='upper right')
plt.xlabel("Treatment")
plt.ylabel("% Time in Open Arm")
plt.title("Stress in STZ vs Vehicle ", size=14)

barplot

I want to add the n value for each bar inside of each bar. Using the answer from this question, I created a bar plot that displays the n value for each group above its corresponding bar:

ax = sns.countplot(x='Treatment', hue='Condition', data=df)
for container in ax.containers:
    ax.bar_label(container)

barplot2

However, I want the n values displayed on my original barplot so I tried this:

color = (sns.color_palette("Paired"))
sns.set_style(style='white')
fig, ax = plt.subplots(figsize=(9,6))
ax = sns.barplot(x='Treatment',y='% _time_open_arm',  hue = 'Condition', data = df, 
                 capsize= .1, palette=color)
plt.legend(title='Groups', loc='upper right')
plt.xlabel("Treatment")
plt.ylabel("% Time in Open Arm")
plt.title("Stress in STZ vs Vehicle ", size=14)

for container in ax.containers:
    ax.bar_label(container)

mistake bar plot

I understand that what I tried is slightly different as I did not use .countplot like the suggested answer in my link. However, whenever I use .countplot, my y axis is automatically converted into "counts" instead of the column I originally wanted to use (%_time). What can I do to get the n values from my second plot to appear on my first plot? Additionally, how can I get these values to appear inside of each bar instead of on top?

Adriana
  • 91
  • 8
  • @JohanC that centered the values inside the bars. Some of the values are overlapping with the bottom of the error bar, making them hard to read. It did not however change the values that it is displaying to the observation counts like I wanted. – Adriana Jan 04 '23 at 20:04

1 Answers1

0

ax.bar_label(container, label_type='center') would put the labels in the center of the bars.

To have the counts instead of the bar heights as label, you need to explicitly count them and use them as the labels= parameter of ax.bar_label(). pd.Categorical can be used to make sure the order of the elements is always the same.

Here is an example:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

np.random.seed(20230104)
df = pd.DataFrame({'Treatment': np.random.choice(['Control', 'STZ'], 50, p=[.39, .61]),
                   'Condition': np.random.choice(['Stressed', 'Unstressed'], 50, p=[.65, .35]),
                   '% _time_open_arm': np.random.randint(2, 19, 50)})
df['Treatment'] = pd.Categorical(df['Treatment'])
df['Condition'] = pd.Categorical(df['Condition'])

sns.set_style(style='white')
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(16, 6))

sns.countplot(x='Treatment', hue='Condition', data=df, ax=ax1)
for container in ax1.containers:
    ax1.bar_label(container)

sns.barplot(x='Treatment', y='% _time_open_arm', hue='Condition', data=df,
            capsize=.1, palette="Paired", ax=ax2)
ax2.legend(title='Groups', loc='upper right')
ax2.set_xlabel("Treatment")
ax2.set_ylabel("% Time in Open Arm")
ax2.set_title("Stress in STZ vs Vehicle ", size=14)

# the containers correspond to each of the hue values
df_counts = df.value_counts(["Condition", "Treatment"], sort=False)
for container, cond in zip(ax2.containers, df["Condition"].cat.categories):
    ax2.bar_label(container, labels=df_counts.loc[cond], label_type='center', fontsize=20, color='crimson')

sns.despine()
plt.tight_layout()
plt.show()

sns.barplot with counts

JohanC
  • 71,591
  • 8
  • 33
  • 66