0

I am using a doublebar plot to compare the total reach of each track. since the total reach of one group is significantly larger than the second group, I decided to create another column, total reach - log(x), that is just the np.log of the original total reach. The y axis reflects the scale for the log but I want to annotate the bars with the actual values from total reach.

Any idea on how to do that?

import seaborn as sns
import matplotlib.pyplot as plt

#plot a bar graph and assign track name variable to hue
double_plot = sns.barplot(
    x='date',
    y='total reach - log(x)',
    hue='track name',
    data=dflog,
    palette=['blue','red'],
    alpha=1,
    dodge=True,
)
double_plot.set_ylim(0,dflog['total reach - log(x)'].max())

for item in double_plot.get_xticklabels():
    item.set_rotation(45)
    

for p in double_plot.patches:
    double_plot.annotate(format(p.get_height(), '.0f'), 
                   (p.get_x() + p.get_width() / 2., p.get_height()), 
                   ha = 'center', va = 'center', 
                   xytext = (0, 9), 
                   textcoords = 'offset points')
fig = plt.gcf()
fig.set_size_inches(15,7)
plt.title("Total Followers - log(x) - Fleetwood Mac's Dreams")

enter image description here

JohanC
  • 71,591
  • 8
  • 33
  • 66
Jackson
  • 21
  • 1
  • 3

1 Answers1

2

You could take the inverse logarithm of the height to get the original numbers back. Supposing the natural logarithm is used, the inverse would be exp. (Or base ** height when the logarithm uses another base.)

Personally, I prefer to name the return value of sns.barplot 'ax' to emphasize that it is a matplotlib ax object for which the standard matplotlib function can be used.

import matplotlib.pyplot as plt
from matplotlib import dates as mdates
import seaborn as sns
import pandas as pd
import numpy as np

dflog = pd.DataFrame()
dflog['date'] = pd.to_datetime(np.repeat(pd.date_range('20200901', freq='D', periods=20), 2))
dflog['total reach - log(x)'] = np.random.uniform(0, 17, 40)
dflog['track name'] = np.tile(['Track 1', 'Track 2'], 20)

ax = sns.barplot(
    x='date',
    y='total reach - log(x)',
    hue='track name',
    data=dflog,
    palette=['blue', 'red'],
    alpha=1,
    dodge=True)
ax.set_ylim(0, dflog['total reach - log(x)'].max() * 1.15)

for item in ax.get_xticklabels():
    item.set_rotation(45)

for p in ax.patches:
    ax.annotate(format(np.exp(p.get_height()), '.0f'),
                (p.get_x() + p.get_width() / 2., p.get_height()),
                ha='center', va='center',
                xytext=(0, 9),
                textcoords='offset points')
ax.xaxis.set_major_formatter(mdates.DateFormatter('%y-%m-%d'))
fig = plt.gcf()
fig.set_size_inches(15, 7)
ax.set_title("Testing")
plt.tight_layout()
plt.show()

example plot

An alternative approach could be to everywhere use the original values (without log), and set a logscale to the y-axis: ax.set_yscale('log').

JohanC
  • 71,591
  • 8
  • 33
  • 66