1

I have a dataset with a million of points like:

1.0,9.5,-0.3
2.3,4.8,0.7
8.1,3.6,0.0
3.9,1.4,-0.1
4.7,5.3,0.0

and PyPlot code like

import pandas
import matplotlib.pyplot as plt

headers =  ['A','B','C']
df = pandas.read_csv('my_data.csv',names=headers)
df['x'] = df['A']
df['y'] = df['B']
# df['color'] = df['C']
plt.xlim(min(df['x'])/2, max(df['x'])*2)
plt.ylim(min(df['y'])/2, max(df['y'])*2)
plt.xlabel("A")
plt.ylabel("B")
plt.plot(df['x'], df['y'], 'o', ms = 0.2) 
plt.show()

I can plot points according to first and second column, but all points have the same color. How to tell PyPlot to color points based on the value in third column?

Stepan
  • 1,391
  • 18
  • 40

2 Answers2

2

You need to use plt.scatter() instead of plt.plot(). There's also no need to re-name the DataFrame columns, the first argument is the x values and the second is the y values. c = z will make the colors be determined by whatever the z values are. cmap will determine what the colors are. Here are the options plt.colorbar() will give you a colorbar reference for the colors plotted for z.

import pandas as pd
import matplotlib.pyplot as plt
import random


x = [random.randint(0,100) for x in range(1000)]
y = [random.randint(0,100) for y in range(1000)]
z = [random.randint(0,100) for z in range(1000)]

df = pd.DataFrame({'A': x, 'B':y, 'C':z})

plt.scatter(df['A'], df['B'], c = df['C'], cmap = 'rainbow')
plt.colorbar()    
plt.show()
mauve
  • 2,707
  • 1
  • 20
  • 34
1

In your case changing

plt.plot(df['x'], df['y'], 'o', ms = 0.2) 

to

plt.scatter(df['x'], df['y'], 'o',c = df['color'], ms = 0.2)

should work, assuming that df['color'] is the same length as the x and y variables.

As was pointed out in the comments, there is no (apparent) need to create new df columns.

This you could use this

import pandas
import matplotlib.pyplot as plt

headers =  ['A','B','C']
df = pandas.read_csv('my_data.csv',names=headers)

plt.xlim(min(df['A'])/2, max(df['A'])*2)
plt.ylim(min(df['B'])/2, max(df['B'])*2)
plt.xlabel("A")
plt.ylabel("B")
plt.scatter(df['A'], df['B'], 'o', c = df['C'], ms = 0.2) 
plt.show()

Edit:

If you really want to make sure that each point has a unique color, then you need to make sure that the c input also only contains unique values.

c = [i for i in range(0,len(df['C'])]
plt.plot(df['A'], df['B'], 'o', c = c, ms = 0.2) 
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Mitchell van Zuylen
  • 3,905
  • 4
  • 27
  • 64