11

I am trying to label a scatter/bubble chart I create from matplotlib with entries from a column in a pandas data frame. I have seen plenty of examples and questions related (see e.g. here and here). Hence I tried to annotate the plot accordingly. Here is what I do:

import matplotlib.pyplot as plt
import pandas as pd 
#example data frame
x = [5, 10, 20, 30, 5, 10, 20, 30, 5, 10, 20, 30]
y = [100, 100, 200, 200, 300, 300, 400, 400, 500, 500, 600, 600]
s = [5, 10, 20, 30, 5, 10, 20, 30, 5, 10, 20, 30]
users =['mark', 'mark', 'mark', 'rachel', 'rachel', 'rachel', 'jeff', 'jeff', 'jeff', 'lauren', 'lauren', 'lauren']

df = pd.DataFrame(dict(x=x, y=y, users=users)

#my attempt to plot things
plt.scatter(x_axis, y_axis, s=area, alpha=0.5)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.annotate(df.users, xy=(x,y))
    plt.show()

I use a pandas datframe and I somehow get a KeyError- so I guess a dict() object is expected? Is there any other way to label the data using with entries from a pandas data frame?

Community
  • 1
  • 1
Rachel
  • 1,937
  • 7
  • 31
  • 58

2 Answers2

14

You can use DataFrame.plot.scatter and then select in loop by DataFrame.iat:

ax = df.plot.scatter(x='x', y='y', alpha=0.5)
for i, txt in enumerate(df.users):
    ax.annotate(txt, (df.x.iat[i],df.y.iat[i]))
plt.show()

graph

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you! This is great! Is there a way to covert it to a bubblechart? I tried `ax = df.plot.scatter(x='x', y='y', s='s', alpha=0.5)` but I get a `TypeError`. Any ideas? – Rachel Jan 05 '17 at 11:07
  • 1
    You need only `ax = df.plot.scatter(x='x', y='y', s=s, alpha=0.5)` - change `s='s'` to `s=s` - input is list, not column – jezrael Jan 05 '17 at 11:19
  • Thank you! This saved me quite some time! – Rachel Jan 05 '17 at 11:21
  • 1
    Thank you for accepting. Btw, if use column `s` - `df = pd.DataFrame(dict(x=x, y=y, users=users, s=s))` then works for me `ax = df.plot.scatter(x='x', y='y', s=df.s, alpha=0.5)` – jezrael Jan 05 '17 at 11:22
  • Yes, just tried it myself. works well if all entries are non-NaN. I have quite a few missing in the original data set. Hence working with a list works smoothly. Thank you! – Rachel Jan 05 '17 at 11:24
  • just thanking you for asking the question and answering! – swyx May 15 '17 at 17:30
  • @swyx - Thank you too. Nice day! – jezrael May 15 '17 at 19:10
5

Jezreal's answer is fine, but i will post this just to show what i meant with df.iterrows in the other thread.

I'm afraid you have to put the scatter (or plot) command in the loop as well if you want to have a dynamic size.

df = pd.DataFrame(dict(x=x, y=y, s=s, users=users))

fig, ax = plt.subplots(facecolor='w')

for key, row in df.iterrows():
    ax.scatter(row['x'], row['y'], s=row['s']*5, alpha=.5)
    ax.annotate(row['users'], xy=(row['x'], row['y']))

enter image description here

Rutger Kassies
  • 61,630
  • 17
  • 112
  • 97
  • Thank you, this is a great answer, too! Plus: it convienently allows to fiddle with the plot's appearance via `fig, ax = plt.subplots(facecolor='w')`! – Rachel Jan 05 '17 at 11:46