1

I have created a seaborn scatter plot and added a trendline to it. I have some datapoints that fall very far away from the trendline (see the ones highlighted in yellow) so I'd like to add data labels only to these points, NOT to all the datapoints in the graph.

Does anyone know what's the best way to do this?

enter image description here

So far I've found answers to "how to add labels to ALL data points" (see this link) but this is not my case.

Scinana
  • 402
  • 3
  • 14

1 Answers1

1

In the accepted answer to the question that you reference you can see that the way they add labels to all data points is by looping over the data points and calling .text(x, y, string) on the axes. You can find the documentation for this method here (seaborn is implemented on top of matplotlib). You'll have to call this method for the selected points.

In your specific case I don't know exactly what formula you want to use to find your outliers but to literally get the ones beyond the limits of the yellow rectangle that you've drawn you could try the following:

for x,y in zip(xarr, yarr):
    if x < 5 and y > 5.5:
        ax.text(x+0.01, y, 'outlier', horizontalalignment='left', size='medium', color='black')

Where xarr is your x-values, yarr your y-values and ax the returned axes from your call to seaborn.

ahnlabb
  • 2,017
  • 1
  • 6
  • 16
  • Thank you for that, but I'm after selected datapoints not all of them. How can you define a logic that would only label the datapoints that you are after? – Scinana Feb 27 '21 at 22:18
  • That is hard to answer in general, it depends. It would help if you add more details to the original post (like showing how your data is represented). – ahnlabb Feb 27 '21 at 22:23
  • Sure thanks, I've added a graph now. Hope this helps :) – Scinana Feb 27 '21 at 22:35
  • I've edited my answer to take your graph into consideration. – ahnlabb Feb 27 '21 at 23:07