1

Hi I am trying to create a scatterplot where each X,Y variable combination is of a particular category, so within the scatterplot I would like to have each category with a different color.

I was able to achieve that as per the code below. However the colorbar that I see on the plot would make more sense if it had the category name on it rather than a numerical value.

Any pointers would be greatly appreciated.

I know seaborn could probably make it easier but I am specifically looking for a matplotlib based solution.

import numpy 
import pandas
import matplotlib.pyplot as plt

numpy.random.seed(0)

N = 50
_categories= ['A', 'B', 'C', 'D']

df = pandas.DataFrame({
    'VarX': numpy.random.uniform(low=130, high=200, size=N),
    'VarY': numpy.random.uniform(low=30, high=100, size=N),
    'Category': numpy.random.choice(_categories, size=N)
})

colorMap = {}
k = 0
for i in _categories:
    colorMap[_categories[k]] = k
    k+=1

plt.figure(figsize=(15,5))
plt.scatter(df.VarX, df.VarY, c= df.Category.map(colorMap), cmap='viridis')
plt.colorbar()
plt.show()

This code produces

Output

enter image description here

Arun Palanisamy
  • 5,281
  • 6
  • 28
  • 53
VolGuy
  • 27
  • 4

2 Answers2

0

The answer from here (pasted below) might be what you're looking for. The key is probably to use something like groups = df.groupby('label') and then plotting each group/category of the df.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(1974)

# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))

groups = df.groupby('label')

# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
    ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
ax.legend()

plt.show()
thehand0
  • 1,123
  • 4
  • 14
0

First of all, I presume you want to have a "discrete" colormap, so one way to do this is:

n_cat = len(_categories)
cmap = plt.get_cmap('viridis', n_cat)

Which is a convenient function to obtain a ListedColormap, i.e. a list of colors for each of your categories, sampled from the default colormap "viridis". Next, you simply pass that colormap over to the scatter plot, apply the colorbar and then set the ticks accordingly:

plt.scatter(df.VarX, df.VarY, c= df.Category.map(colorMap), cmap=cmap)
cbar = plt.colorbar()
tick_locs = (numpy.arange(n_cat) + 0.5)*(n_cat-1)/n_cat
cbar.set_ticks(tick_locs)
cbar.set_ticklabels(_categories)

Note: this answer is heavily inspired from this answer

Asmus
  • 5,117
  • 1
  • 16
  • 21