425

I am trying to make a scatter plot and annotate data points with different numbers from a list. So, for example, I want to plot y vs x and annotate with corresponding numbers from n.

y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]
ax = fig.add_subplot(111)
ax1.scatter(z, y, fmt='o')

Any ideas?

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Labibah
  • 5,371
  • 6
  • 25
  • 23
  • You can also get scatter plot with tooltip labels on hover using the mpld3 library. https://mpld3.github.io/examples/scatter_tooltip.html – Claude COULOMBE May 20 '19 at 02:17

9 Answers9

778

I'm not aware of any plotting method which takes arrays or lists but you could use annotate() while iterating over the values in n.

import matplotlib.pyplot as plt
y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]

fig, ax = plt.subplots()
ax.scatter(z, y)

for i, txt in enumerate(n):
    ax.annotate(txt, (z[i], y[i]))

There are a lot of formatting options for annotate(), see the matplotlib website:

enter image description here

Joop
  • 3,706
  • 34
  • 55
Rutger Kassies
  • 61,630
  • 17
  • 112
  • 97
  • 2
    Works well on top of Seaborn `regplot`s without too much disruption, too. – ijoseph Dec 09 '16 at 01:14
  • 1
    @Rutger I use a pandas datframe and I somehow get a `KeyError`- so I guess a `dict()` object is expected? Is there any other way to label the data using `enumerate`, `annotate` and a pandas data frame? – Rachel Jan 04 '17 at 18:04
  • @Rachel, You can use `for row in df.iterrows():`, and then access the values with `row['text'], row['x-coord']` etc. If you post a separate question i'll have a look at it. – Rutger Kassies Jan 05 '17 at 08:11
  • @RutgerKassies Thanks, Rutger! I posted a question here http://stackoverflow.com/questions/41481153/how-to-label-bubble-chart-scatter-plot-with-column-from-pandas-dataframe I fear that it may be to similar to this very question. But I can't work it out somehow. Thank you for your help! – Rachel Jan 05 '17 at 09:20
  • 2
    For points that happen to be very close, is there any way to offset the annotations and draw lines pointing from the data points to the labels in order to nicely separate the otherwise overlapping labels? – aviator May 06 '20 at 20:43
  • 2
    @aviator, not built-in unfortunately. But see for example this using networkx's layout engine: https://stackoverflow.com/a/34697108/1755432 – Rutger Kassies May 07 '20 at 08:36
  • Is it possible to shift the text relative to the data points? – Ben Jun 03 '22 at 07:25
  • 1
    @Ben, yes the annotate function has a `xytext=(x,y)` keyword that allows specifying the location of the text label. The default is the same as the point `xy=(x,y)`. For example: `ax.annotate(txt, xy=(z[i], y[i]), xytext=(z[i]+0.1, y[i]+0.1))` That will also allow drawing lines arrows between the two locations. More info at: https://matplotlib.org/3.5.0/tutorials/text/annotations.html – Rutger Kassies Jun 03 '22 at 07:34
63

In case anyone is trying to apply the above solutions to a .scatter() instead of a .subplot(),

I tried running the following code

import matplotlib.pyplot as plt
y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]

fig, ax = plt.scatter(z, y)

for i, txt in enumerate(n):
    ax.annotate(txt, (z[i], y[i]))

But ran into errors stating "cannot unpack non-iterable PathCollection object", with the error specifically pointing at codeline fig, ax = plt.scatter(z, y)

I eventually solved the error using the following code

import matplotlib.pyplot as plt
plt.scatter(z, y)

for i, txt in enumerate(n):
    plt.annotate(txt, (z[i], y[i]))

I didn't expect there to be a difference between .scatter() and .subplot() I should have known better.

Joop
  • 3,706
  • 34
  • 55
Heather Claxton
  • 1,001
  • 8
  • 11
  • I'm using this exact same code in one of my scripts (the second block here), but I'm met with an error message saying "IndexError: index 1 is out of bounds for axis 0 with size 1", which is referring to "txt" in the annotate function. Any idea why this is happening? – Brandon Oct 08 '20 at 07:23
  • 2
    That's because `plt.scatter` is not meant to create a `Figure` and an `Axes` like `plt.subplots()` does, but a `PathCollection` containing the scatter points. You are supposed to create the figure and axes beforehand. – Alperino Jul 22 '22 at 11:07
45

In versions earlier than matplotlib 2.0, ax.scatter is not necessary to plot text without markers. In version 2.0 you'll need ax.scatter to set the proper range and markers for text.

import matplotlib.pyplot as plt
y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]

fig, ax = plt.subplots()

for i, txt in enumerate(n):
    ax.annotate(txt, (z[i], y[i]))

And in this link you can find an example in 3d.

Joop
  • 3,706
  • 34
  • 55
rafaelvalle
  • 6,683
  • 3
  • 34
  • 36
  • 2
    This is awesome! Thanks for sharing this solution. Can you also share what the proper code is to set the size of the figure? Implementations such as `plt.figure(figsize=(20,10))` aren't working as expected, in that that invoking this code doesn't actually change the size of the image. Looking forward to your assistance. Thanks! – Levine Jan 24 '18 at 21:45
  • fig, ax = plt.subplots(figsize=(20,10)) – rafaelvalle Jan 25 '18 at 01:47
35

You may also use pyplot.text (see here).

def plot_embeddings(M_reduced, word2Ind, words):
    """ 
        Plot in a scatterplot the embeddings of the words specified in the list "words".
        Include a label next to each point.
    """
    for word in words:
        x, y = M_reduced[word2Ind[word]]
        plt.scatter(x, y, marker='x', color='red')
        plt.text(x+.03, y+.03, word, fontsize=9)
    plt.show()

M_reduced_plot_test = np.array([[1, 1], [-1, -1], [1, -1], [-1, 1], [0, 0]])
word2Ind_plot_test = {'test1': 0, 'test2': 1, 'test3': 2, 'test4': 3, 'test5': 4}
words = ['test1', 'test2', 'test3', 'test4', 'test5']
plot_embeddings(M_reduced_plot_test, word2Ind_plot_test, words)

enter image description here

Kamal El-Saaid
  • 145
  • 2
  • 11
irudyak
  • 2,271
  • 25
  • 20
20

I would love to add that you can even use arrows /text boxes to annotate the labels. Here is what I mean:

import random
import matplotlib.pyplot as plt


y = [2.56422, 3.77284, 3.52623, 3.51468, 3.02199]
z = [0.15, 0.3, 0.45, 0.6, 0.75]
n = [58, 651, 393, 203, 123]

fig, ax = plt.subplots()
ax.scatter(z, y)

ax.annotate(n[0], (z[0], y[0]), xytext=(z[0]+0.05, y[0]+0.3), 
    arrowprops=dict(facecolor='red', shrink=0.05))

ax.annotate(n[1], (z[1], y[1]), xytext=(z[1]-0.05, y[1]-0.3), 
    arrowprops = dict(  arrowstyle="->",
                        connectionstyle="angle3,angleA=0,angleB=-90"))

ax.annotate(n[2], (z[2], y[2]), xytext=(z[2]-0.05, y[2]-0.3), 
    arrowprops = dict(arrowstyle="wedge,tail_width=0.5", alpha=0.1))

ax.annotate(n[3], (z[3], y[3]), xytext=(z[3]+0.05, y[3]-0.2), 
    arrowprops = dict(arrowstyle="fancy"))

ax.annotate(n[4], (z[4], y[4]), xytext=(z[4]-0.1, y[4]-0.2),
    bbox=dict(boxstyle="round", alpha=0.1), 
    arrowprops = dict(arrowstyle="simple"))

plt.show()

Which will generate the following graph: enter image description here

Anwarvic
  • 12,156
  • 4
  • 49
  • 69
16

For limited set of values matplotlib is fine. But when you have lots of values the tooltip starts to overlap over other data points. But with limited space you can't ignore the values. Hence it's better to zoom out or zoom in.

Using plotly

import plotly.express as px
import pandas as pd

df = px.data.tips()

df = px.data.gapminder().query("year==2007 and continent=='Americas'")


fig = px.scatter(df, x="gdpPercap", y="lifeExp", text="country", log_x=True, size_max=100, color="lifeExp")
fig.update_traces(textposition='top center')
fig.update_layout(title_text='Life Expectency', title_x=0.5)
fig.show()

enter image description here

hamflow
  • 27
  • 6
bigbounty
  • 16,526
  • 5
  • 37
  • 65
  • what are you using here for inline zooming? It's not `mpld3`, is it? – Saraha Nov 23 '20 at 17:26
  • 3
    imho, an animation at this speed adds nothing, a carefully designed fixed image would be less frustrating. – mins Jan 27 '21 at 10:25
12

Python 3.6+:

coordinates = [('a',1,2), ('b',3,4), ('c',5,6)]
for x in coordinates: plt.annotate(x[0], (x[1], x[2]))
William Miller
  • 9,839
  • 3
  • 25
  • 46
palash
  • 479
  • 4
  • 15
4

This might be useful when you need individually annotate in different time (I mean, not in a single for loop)

ax = plt.gca()
ax.annotate('your_lable', (x,y)) 

where x and y are the your target coordinate and type is float/int.

Uzzal Podder
  • 2,925
  • 23
  • 26
3

As a one liner using list comprehension and numpy:

[ax.annotate(x[0], (x[1], x[2])) for x in np.array([n,z,y]).T]

setup is ditto to Rutger's answer.

andor kesselman
  • 1,089
  • 2
  • 15
  • 26