4

I have a table like below, which is stored in pandas dataframe called 'data'.

Column1 Device1 event_rate % % event dist % non-event dist % total dist
0 Android 3.08 27.3 32.96 32.75
1 Chrome OS 4.05 0.47 0.42 0.43
2 Chromium OS 9.95 0.23 0.08 0.09
3 Linux 2.27 0.05 0.09 0.09
4 Mac OS 6.43 4.39 2.45 2.52
5 Others 2.64 7.41 10.48 10.36
6 Windows 5.7 15.89 10.08 10.3
7 iOS 3.76 44.26 43.44 43.47

I am trying to create a desired seaborn/matplot chart like shown below which was created in excel.

enter image description here

Here is my python code:

feature = 'Device1'
fig, ax1 = plt.subplots(figsize=(10,6))
color = 'tab:blue'
title = 'Event rate by ' + feature
ax1.set_title(title, fontsize=14)
ax1.set_xlabel(feature, fontsize=14)
ax2 = sns.barplot(x=feature, y='% non-event dist', data = data, color=color)
ax2 = sns.barplot(x=feature, y='% event dist', data = data, color='orange')
plt.xticks(rotation=45)
ax1.set_ylabel('% Dist', fontsize=14, color=color)
ax1.tick_params(axis='y')
ax2 = ax1.twinx()
color = 'tab:red'
ax2.set_ylabel('Event Rate %', fontsize=14, color=color)
ax2 = sns.lineplot(x=feature, y='event_rate %', data = data, sort=False, color=color)
ax2.tick_params(axis='y', color=color)
handles1, labels1 = ax1.get_legend_handles_labels()
handles2, labels2 = ax2.get_legend_handles_labels()
handles = handles1 + handles2
labels = labels1 + labels2
plt.legend(handles,labels)
plt.show()

Here is what I get

enter image description here

Issues:

  1. Legend is not showing.
  2. The barplots are overlapping each other.
  3. Is there a way to show data labels?

How can I make my seaborn plot look similar to my excel plot? Thanks.

Zenvega
  • 1,974
  • 9
  • 28
  • 45

2 Answers2

5

Load & Shape DataFrame

  • The most import part of plotting data is to correctly shape the DataFrame for the plot API.
  • I think it is easier to convert the DataFrame from a wide to long format using .stack
  • .iloc[:, :-1] selects all rows, but leaves the '% total dist' out.
import pandas as pd
import seaborn as sns

# create dataframe
data = {'Device1': ['Android', 'Chrome OS', 'Chromium OS', 'Linux', 'Mac OS', 'Others', 'Windows', 'iOS'],
        'event_rate %': [3.08, 4.05, 9.95, 2.27, 6.43, 2.64, 5.7, 3.76],
        '% event dist': [27.3, 0.47, 0.23, 0.05, 4.39, 7.41, 15.89, 44.26],
        '% non-event dist': [32.96, 0.42, 0.08, 0.09, 2.45, 10.48, 10.08, 43.44],
        '% total dist': [32.75, 0.43, 0.09, 0.09, 2.52, 10.36, 10.3, 43.47]}

df = pd.DataFrame(data)

# display(df.head())
       Device1  event_rate %  % event dist  % non-event dist  % total dist
0      Android          3.08         27.30             32.96         32.75
1    Chrome OS          4.05          0.47              0.42          0.43
2  Chromium OS          9.95          0.23              0.08          0.09
3        Linux          2.27          0.05              0.09          0.09
4       Mac OS          6.43          4.39              2.45          2.52
# convert from a wide to long format
dfl = df.iloc[:, :-1].set_index('Device1').stack().reset_index(name='Values').rename({'level_1': 'Type'}, axis=1)

# select the desired data
dist = dfl[dfl.Type.str.contains('dist')]
rate = dfl[dfl.Type.str.contains('rate')]

# display(dist.head())
       Device1              Type  Values
1      Android      % event dist   27.30
2      Android  % non-event dist   32.96
4    Chrome OS      % event dist    0.47
5    Chrome OS  % non-event dist    0.42
7  Chromium OS      % event dist    0.23

# display(rate.head())
        Device1          Type  Values
0       Android  event_rate %    3.08
3     Chrome OS  event_rate %    4.05
6   Chromium OS  event_rate %    9.95
9         Linux  event_rate %    2.27
12       Mac OS  event_rate %    6.43

Plot and Annotate

  • I have place the legends next to their respective axes
  • Referenced this SO Question for creating the combined legend.
  • Adjust the values in bbox_to_anchor=(0.8, -0.25) to move the legend around.
# create the figure and primary axes
fig, ax = plt.subplots(figsize=(11, 7))

# plot and format the bars
sns.barplot(data=dist, x='Device1', y='Values', hue='Type')
ax.set_ylabel('% Dist')
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)
l1 = ax.legend(bbox_to_anchor=(-0.24, 1), loc='upper left')

# create the secondary axes
ax2 = ax.twinx()

# plot and format the line
sns.lineplot(data=rate, x='Device1', y='Values', ax=ax2, color='grey', label='event rate %', marker='o')
ax2.set_ylabel('% Event Rate')
l2 = ax2.legend(bbox_to_anchor=(1.04, 1), loc='upper left')

# combined legend by extracting the components from legend l1 and l2
plt.legend(l1.get_patches() + l2.get_lines(), 
           [text.get_text() for text in l1.get_texts() + l2.get_texts()], 
           bbox_to_anchor=(0.8, -0.25), ncol=3)

# remove l1 from the plot
l1.remove()

# annotate the line
for _, x, _, y in rate.itertuples():
    ax2.text(x, y, y)

Combined Legend

enter image description here

Separate Legend

  • If you want separate legends, remove plt.legend(...) and l1.remove()

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
2

When using sns.barplot(), you can use hue parameter to make an unstacked bar chart, but you'll need an extra variable defining which variable to unstack (I'm using pd.melt() to deal with this).

For the legend, you can use fig.legend().

To show data labels, use plt.annotate()

feature = 'Device1'
fig, ax1 = plt.subplots(figsize=(10,6))
color = 'tab:blue'
title = 'Event rate by ' + feature
ax1.set_title(title, fontsize=14)
ax1.set_xlabel(feature, fontsize=14)
# ax2 = sns.barplot(x=feature, y='% non-event dist', data = data, color=color)
# ax2 = sns.barplot(x=feature, y='% event dist', data = data, color='orange')
ax1 = sns.barplot(x=feature, y='value', hue='variable', data=data.melt(['Device1'], ['% event dist', '% non-event dist']))
plt.xticks(rotation=45)
ax1.set_ylabel('% Dist', fontsize=14, color=color)
ax1.tick_params(axis='y')
ax2 = ax1.twinx()
color = 'tab:red'
ax2.set_ylabel('Event Rate %', fontsize=14, color=color)
# ax2 = sns.lineplot(x=feature, y='event_rate %', data = data, sort=False, color=color)
ax2 = sns.lineplot(x=feature, y='event_rate %', data = data, sort=False, color=color, label='event_rate %')
ax2.tick_params(axis='y', color=color)
# handles1, labels1 = ax1.get_legend_handles_labels()
# handles2, labels2 = ax2.get_legend_handles_labels()
# handles = handles1[:-1] + handles2
# plt.legend(handles1, labels)
ax1.legend([])
ax2.legend([])
fig.legend(bbox_to_anchor=(0.75, -0.07), ncol=3)

# annotations
for x, y in enumerate(data['event_rate %']):
    plt.annotate(text=y, xy=(x-0.2,y+0.2))

## Use this for annotating if you want the annotations to go above and below the plot based on line direction
# prev_y = data['event_rate %'][0]
# for x, y in enumerate(data['event_rate %']):
#     plt.annotate(text=y, xy=(x-0.2,y+(0.2 if y >= prev_y else -0.2)))
#     prev_y = y

# set ylim to higher limit in order to accomodate the annotations    
ax2.set_ylim((0, 11))

plt.show()

enter image description here

If you want a better looking grid, make sure that the values shown in yticks have the same length.

ax1.set_ylim(0,50) 
# ax1.get_yticks() -> [ 0. 10. 20. 30. 40. 50.] # length = 6
ax2.set_ylim((0, 11)) 
# ax2.get_yticks() -> [ 0.  2.  4.  6.  8. 10. 12.] # length = 7
yticks = np.linspace(0,11,len(ax1.get_yticks()))
ax2.set_yticks(yticks) 
# ax2.get_yticks() -> [ 0.   2.2  4.4  6.6  8.8 11. ] # length = 6

enter image description here

Or turn off the second axis grid, which will make the grid look similar to the excel chart.

ax2.grid(False)

enter image description here

Gusti Adli
  • 1,225
  • 4
  • 13