5

I have a Seaborn scatterplot using data from a dataframe. I would like to add data labels to the plot, using other values in the df associated with that observation (row). Please see below - is there a way to add at least one of the column values (A or B) to the plot? Even better, is there a way to add two labels (in this case, both the values in column A and B?)

I have tried to use a for loop using functions like the below per my searches, but have not had success with this scatterplot.

Thank you for your help.

df_so = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
scatter_so=sns.lmplot(x='C', y='D', data=df_so,
           fit_reg=False,y_jitter=0, scatter_kws={'alpha':0.2})

fig, ax = plt.subplots() #stuff like this does not work 
Z_D
  • 797
  • 2
  • 12
  • 30
  • I don't think so because that one looks strictly at adding from a set of 3 clusters – Z_D May 30 '18 at 13:19
  • 2
    The answer to the duplicate labels scatter points created from a pandas dataframe and seaborn with values from another column in a dataframe. Isn't that exactly what you want? – DavidG May 30 '18 at 13:24
  • It may not be what OP wants, but it is what this question asks for. Unless it is edited to ask for the actual problem (showing in how far the linked solution doesn't help) I would agree it's a duplicate. – ImportanceOfBeingErnest May 30 '18 at 13:25

1 Answers1

3

Use:

df_so = pd.DataFrame(np.random.randint(0,100,size=(20, 4)), columns=list('ABCD'))
scatter_so=sns.lmplot(x='C', y='D', data=df_so,
           fit_reg=False,y_jitter=0, scatter_kws={'alpha':0.2})


def label_point(x, y, val, ax):
    a = pd.concat({'x': x, 'y': y, 'val': val}, axis=1)
    for i, point in a.iterrows():
        ax.text(point['x']+.02, point['y'], str(point['val']))
        
label_point(df_so['C'], df_so['D'], '('+df_so['A'].astype(str)+', '+df_so['B'].astype(str)+')', plt.gca())

Output:

enter image description here

glicerico
  • 690
  • 4
  • 20
Scott Boston
  • 147,308
  • 15
  • 139
  • 187