How to specify scatter plot point color in matplotlib pyplot?

Question

I have a dataset with a million of points like:

1.0,9.5,-0.3
2.3,4.8,0.7
8.1,3.6,0.0
3.9,1.4,-0.1
4.7,5.3,0.0

and PyPlot code like

import pandas
import matplotlib.pyplot as plt

headers =  ['A','B','C']
df = pandas.read_csv('my_data.csv',names=headers)
df['x'] = df['A']
df['y'] = df['B']
# df['color'] = df['C']
plt.xlim(min(df['x'])/2, max(df['x'])*2)
plt.ylim(min(df['y'])/2, max(df['y'])*2)
plt.xlabel("A")
plt.ylabel("B")
plt.plot(df['x'], df['y'], 'o', ms = 0.2) 
plt.show()

I can plot points according to first and second column, but all points have the same color. How to tell PyPlot to color points based on the value in third column?

also, i'm not clear why you're adding df columns - you could just plt.scatter(df['A'], df['B'], c = df['C']) — mauve, Apr 26 '19 at 13:26
@mauve how to convert 'C' that is double in range -2.0 .. 3.0 to nicely spaced 256 colors? Should I write ` c = double_to_color(df['C'])`? What should be in the function? A switch with 256 cases `if value less then, but more than`? — Stepan, Apr 26 '19 at 13:34
if you do c = df['C'], it will automatically coordinate the color range with your value range. — mauve, Apr 26 '19 at 13:37
`plt.plot` can only ever have a single color. Did you mean to use `plt.scatter` instead? — ImportanceOfBeingErnest, Apr 26 '19 at 13:45
See the answer given here: https://stackoverflow.com/questions/50527658/scatter-plot-of-2-variables-with-colorbar-based-on-third-variable — screenpaver, Apr 26 '19 at 14:06

score 2 · Accepted Answer · answered Apr 26 '19 at 13:52

You need to use plt.scatter() instead of plt.plot(). There's also no need to re-name the DataFrame columns, the first argument is the x values and the second is the y values. c = z will make the colors be determined by whatever the z values are. cmap will determine what the colors are. Here are the options plt.colorbar() will give you a colorbar reference for the colors plotted for z.

import pandas as pd
import matplotlib.pyplot as plt
import random


x = [random.randint(0,100) for x in range(1000)]
y = [random.randint(0,100) for y in range(1000)]
z = [random.randint(0,100) for z in range(1000)]

df = pd.DataFrame({'A': x, 'B':y, 'C':z})

plt.scatter(df['A'], df['B'], c = df['C'], cmap = 'rainbow')
plt.colorbar()    
plt.show()

@Stepan: You can also use `x = np.random.randint(0,100,1000)` from Numpy — Sheldore, Apr 26 '19 at 23:35

score 1 · Answer 2 · edited May 18 '19 at 11:08

1

In your case changing

plt.plot(df['x'], df['y'], 'o', ms = 0.2)

to

plt.scatter(df['x'], df['y'], 'o',c = df['color'], ms = 0.2)

should work, assuming that df['color'] is the same length as the x and y variables.

As was pointed out in the comments, there is no (apparent) need to create new df columns.

This you could use this

import pandas
import matplotlib.pyplot as plt

headers =  ['A','B','C']
df = pandas.read_csv('my_data.csv',names=headers)

plt.xlim(min(df['A'])/2, max(df['A'])*2)
plt.ylim(min(df['B'])/2, max(df['B'])*2)
plt.xlabel("A")
plt.ylabel("B")
plt.scatter(df['A'], df['B'], 'o', c = df['C'], ms = 0.2) 
plt.show()

Edit:

If you really want to make sure that each point has a unique color, then you need to make sure that the c input also only contains unique values.

c = [i for i in range(0,len(df['C'])]
plt.plot(df['A'], df['B'], 'o', c = c, ms = 0.2)

edited May 18 '19 at 11:08

marc_s

732,580
175
1,330
1,459

answered Apr 26 '19 at 13:33

Mitchell van Zuylen

3,905
4
27
64

My `'C'` column contain doubles. How to convert it to colors? – Stepan Apr 26 '19 at 13:35
what do you mean "doubles" – mauve Apr 26 '19 at 13:38
variable type. Like `0.123`, `-4.567`, not like `True` and not like `"blue"` – Stepan Apr 26 '19 at 13:53
again, that is solved by c = z, it's a feature in matplotlib. – mauve Apr 26 '19 at 14:01
@Stepan added in an edit – Mitchell van Zuylen Apr 26 '19 at 15:14

How to specify scatter plot point color in matplotlib pyplot?

2 Answers2