3

This has been troubling me for the past 30 minutes. What I'd like to do is to scatter plot by category. I took a look at the documentation, but I haven't been able to find the answer there. I looked here, but when I ran that in iPython Notebook, I don't get anything.

Here's my data frame:

time    cpu   wait    category 
8       1     0.5     a 
9       2     0.2     a
2       3     0.1     b
10      4     0.7     c
3       5     0.2     c
5       6     0.8     b

Ideally, I'd like to have a scatter plot that shows CPU on the x axis, wait on the y axis, and each point on the graph is distinguished by category. So for example, if a=red, b=blue, and c=green then point (1, 0.5) and (2, 0.2) should be red, (3, 0.1) and (6, 0.8) should be blue, etc.

How would I do this with pandas? or matplotlib? whichever does the job.

Community
  • 1
  • 1
jason adams
  • 545
  • 2
  • 15
  • 30

3 Answers3

4

This is essentially the same answer as @JoeCondron, but a two liner:

cmap = {'a': 'red', 'b': 'blue', 'c': 'yellow'}
df.plot(x='cpu', y='wait', kind='scatter', 
        colors=[cmap.get(c, 'black') for c in df.category])

If no color is mapped for the category, it defaults to black.

EDIT:

The above works for Pandas 0.14.1. For 0.16.2, 'colors' needs to be changed to 'c':

df.plot(x='cpu', y='wait', kind='scatter', 
    c=[cmap.get(c, 'black') for c in df.category])
Alexander
  • 105,104
  • 32
  • 201
  • 196
2

You could do

color_map = {'a': 'r', 'b': 'b', 'c': 'y'}
ax = plt.subplot()
x, y = df.cpu, df.wait
colors = df.category.map(color_map)
ax.scatter(x, y, color=colors)

This will give you red for category a, blue for b, yellow for c. So you can past a list of color aliases of the same length as the arrays. You can check out the myriad available colours here : http://matplotlib.org/api/colors_api.html. I don't think the plot method is very useful for scatter plots.

JoeCondron
  • 8,546
  • 3
  • 27
  • 28
  • So that next time I don't have to resort to SO, could my question been resolved through documentation? Or is this just common knowledge? – jason adams Jul 09 '15 at 21:41
  • I wouldn't say common knowledge. The matplotlib docs aren't great but they have lots of examples. However, you gotta down load the code to read it. I learned by trial and error I guess. Mastering (not saying I have) the API for matplotlib is difficult. By the way, you can pass an array of colours shorter than the number of points and it will just cycle through them. Also, the scatter method has a paramter ```s``` which controls the size of the dots. This can be a single number or array of numbers and it cycles through them in the same way as the colours – JoeCondron Jul 09 '15 at 21:46
  • I'm getting: AttributeError: Unknown property colors, am I missing a library? This is what I have now: import pandas as pd import numpy as np import matplotlib import matplotlib.pyplot as plt import tables as tb %matplotlib inline – jason adams Jul 09 '15 at 21:47
  • pardon me, it's color singular – JoeCondron Jul 09 '15 at 21:50
  • any idea how to show the plot? I'm doing ax.show() and I'm not getting anything. This line ax.scatter(x,y, color=colors) just gives me , I'm using iPython Notebook – jason adams Jul 09 '15 at 22:04
  • If you execute a cell with ```%pylab inline``` it will display all your plots from then on. Be warned, there is a side effect; it does ```import *``` from numpy and matplotlib so it might overwrite some variables but it warns you of this. If you do it at the start of your session it shouldn't be a problem. – JoeCondron Jul 10 '15 at 07:30
2

I'd create a column with your colors based on category, then do the following, where ax is a matplotlib ax and df is your dataframe:

ax.scatter(df['cpu'], df['wait'], marker = '.', c = df['colors'], s = 100)
alex314159
  • 3,159
  • 2
  • 20
  • 28