Annotation changing the plot?

Question

Usually when I plot some distribution I like to insert auxiliar lines to show extra information, such as mean:

plt.figure(figsize=(15, 5))
h = r1['TAXA_ATUAL_UP'].mean()
plt.axvline(h, color='k', linestyle='dashed', linewidth=2)
print(h) # 692.6621026418171
plt.annotate('{0:.2f}'.format(h), xy=(h+100, 0.02), fontsize=12)

sns.distplot(r1['TAXA_ATUAL_UP'].dropna())
sns.distplot(r1[r1['REMOTO'] == 1]['TAXA_ATUAL_UP'].dropna(), hist=False, label='Y = 1')
sns.distplot(r1[r1['REMOTO'] == 0]['TAXA_ATUAL_UP'].dropna(), hist=False, label='Y = 0')

Recently, using the same code to plot other data, I got a weird result. Basically, what I notice is that the h value is big and the result is that the plot is reduced drastically:

plt.figure(figsize=(15, 5))
h = r1['TAXA_ATUAL_DOWN'].mean()
plt.axvline(h, color='k', linestyle='dashed', linewidth=2)
print(h) # 8777.987291627895
plt.annotate('{0:.2f}'.format(h), xy=(h, 0.02), fontsize=12)

sns.distplot(r1['TAXA_ATUAL_DOWN'].dropna())
sns.distplot(r1[r1['REMOTO'] == 1]['TAXA_ATUAL_DOWN'].dropna(), hist=False, label='Y = 1')

I wonder what causes this I how I should get the annotation to work properly, or fix whaterver I'm doing wrong.

score 2 · Accepted Answer · edited Feb 23 '18 at 23:01

Try replacing

plt.annotate('{0:.2f}'.format(h), xy=(h, 0.02), fontsize=12)

with

plt.annotate('{0:.2f}'.format(h), xy=(h+100, 0.00012), fontsize=12)

I believe what is happening is that you are trying to annotate at the same xy coordinates as in your old plot, but the axis scales are drastically different. So when you annotate at xy=(h,0.02), 0.02 is significantly above the maximum of your y axis, and your figure is being re-scaled accordingly.

Looking at your new plot, it looks like it would make more sense to put your text at somewhere like xy=(h+100, 0.00012), or somewhere thereabouts. If that works, you can fine-tune your location according to where you want it (or, more programmatically, put your y coordinate at something like 0.75 * maximum_y_value, where maximum_y_value is the highest point on your y axis).

A hacky but effective way to do this would be to use

y_max = max([h.get_height() for h in sns.distplot(r1[r1['REMOTO'] == 1]['TAXA_ATUAL_DOWN'].dropna()).patches])

plt.annotate('{0:.2f}'.format(h), xy=(h, 0.75*y_max), fontsize=12)

What this actually does is get the values of the histogram that would be plotted by default in sns.distplot (which you have disabled), and finds the max of that.

Thank you. Do you know how I can get the maximum_y_value in the measure used by the plots? I mean, maximum_y_value = r1['TAXA_ATUAL_DOWN'].max() do not produce the desired output since I'm plotting a distribution. — pceccon, Feb 22 '18 at 19:32
From [this](https://stackoverflow.com/questions/37374983/get-data-points-from-seaborn-distplot) answer, I believe you would be able to get it like this: `max([h.get_height() for h in sns.distplot(r1[r1['REMOTO'] == 1]['TAXA_ATUAL_DOWN'].dropna().patches])`, or figure out a nicer mathematical way of doing it. FYI, my way would be an approximation, but it should be accurate enough for what you're doing with it. — sacuL, Feb 22 '18 at 19:43

Annotation changing the plot?

1 Answers1