3

Usually when I plot some distribution I like to insert auxiliar lines to show extra information, such as mean:

plt.figure(figsize=(15, 5))
h = r1['TAXA_ATUAL_UP'].mean()
plt.axvline(h, color='k', linestyle='dashed', linewidth=2)
print(h) # 692.6621026418171
plt.annotate('{0:.2f}'.format(h), xy=(h+100, 0.02), fontsize=12)

sns.distplot(r1['TAXA_ATUAL_UP'].dropna())
sns.distplot(r1[r1['REMOTO'] == 1]['TAXA_ATUAL_UP'].dropna(), hist=False, label='Y = 1')
sns.distplot(r1[r1['REMOTO'] == 0]['TAXA_ATUAL_UP'].dropna(), hist=False, label='Y = 0')

enter image description here

Recently, using the same code to plot other data, I got a weird result. Basically, what I notice is that the h value is big and the result is that the plot is reduced drastically:

plt.figure(figsize=(15, 5))
h = r1['TAXA_ATUAL_DOWN'].mean()
plt.axvline(h, color='k', linestyle='dashed', linewidth=2)
print(h) # 8777.987291627895
plt.annotate('{0:.2f}'.format(h), xy=(h, 0.02), fontsize=12)

sns.distplot(r1['TAXA_ATUAL_DOWN'].dropna())
sns.distplot(r1[r1['REMOTO'] == 1]['TAXA_ATUAL_DOWN'].dropna(), hist=False, label='Y = 1')

enter image description here

I wonder what causes this I how I should get the annotation to work properly, or fix whaterver I'm doing wrong.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
pceccon
  • 9,379
  • 26
  • 82
  • 158

1 Answers1

2

Try replacing

plt.annotate('{0:.2f}'.format(h), xy=(h, 0.02), fontsize=12)

with

plt.annotate('{0:.2f}'.format(h), xy=(h+100, 0.00012), fontsize=12)

I believe what is happening is that you are trying to annotate at the same xy coordinates as in your old plot, but the axis scales are drastically different. So when you annotate at xy=(h,0.02), 0.02 is significantly above the maximum of your y axis, and your figure is being re-scaled accordingly.

Looking at your new plot, it looks like it would make more sense to put your text at somewhere like xy=(h+100, 0.00012), or somewhere thereabouts. If that works, you can fine-tune your location according to where you want it (or, more programmatically, put your y coordinate at something like 0.75 * maximum_y_value, where maximum_y_value is the highest point on your y axis).

A hacky but effective way to do this would be to use

y_max = max([h.get_height() for h in sns.distplot(r1[r1['REMOTO'] == 1]['TAXA_ATUAL_DOWN'].dropna()).patches])

plt.annotate('{0:.2f}'.format(h), xy=(h, 0.75*y_max), fontsize=12)

What this actually does is get the values of the histogram that would be plotted by default in sns.distplot (which you have disabled), and finds the max of that.

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
sacuL
  • 49,704
  • 8
  • 81
  • 106
  • Thank you. Do you know how I can get the maximum_y_value in the measure used by the plots? I mean, maximum_y_value = r1['TAXA_ATUAL_DOWN'].max() do not produce the desired output since I'm plotting a distribution. – pceccon Feb 22 '18 at 19:32
  • From [this](https://stackoverflow.com/questions/37374983/get-data-points-from-seaborn-distplot) answer, I believe you would be able to get it like this: `max([h.get_height() for h in sns.distplot(r1[r1['REMOTO'] == 1]['TAXA_ATUAL_DOWN'].dropna().patches])`, or figure out a nicer mathematical way of doing it. FYI, my way would be an approximation, but it should be accurate enough for what you're doing with it. – sacuL Feb 22 '18 at 19:43