This has upset me so much, that I have written get_cmap_string
that returns a function which works exactly as cmap
but acts also on strings.
data = ["bob", "joe", "pete", "andrew", "pete"]
cmap = get_cmap_string(palette='viridis', domain=data)
cmap("joe")
# (0.20803, 0.718701, 0.472873, 1.0)
cmap("joe", alpha=0.5)
# (0.20803, 0.718701, 0.472873, 0.5)
1. Implementation
The basic idea as mentioned by all other answers is that we need a hash table -- a mapping from our strings to integers, which is unique. In python this is just a dictionary.
The reason hash(str)
won't work, is that even though matplotlib's cmap
accepts any integer, it is possible for two different strings to get the same color. For example, if hash(str1)
is 8
and hash(str2)
is 18
, and we initialize cmap
as get_cmap(name=palette, lut=10)
then cmap(hash(str1))
will be the same as cmap(hash(str2))
Code
import numpy as np
import matplotlib.cm
def get_cmap_string(palette, domain):
domain_unique = np.unique(domain)
hash_table = {key: i_str for i_str, key in enumerate(domain_unique)}
mpl_cmap = matplotlib.cm.get_cmap(palette, lut=len(domain_unique))
def cmap_out(X, **kwargs):
return mpl_cmap(hash_table[X], **kwargs)
return cmap_out
2. Usage
Example as in other answers, but now note that the name pete
appears twice.
import matplotlib.pyplot as plt
# data
names = ["bob", "joe", "pete", "andrew", "pete"]
# color map for the data
cmap = get_cmap_string(palette='viridis', domain=names)
# example usage
x = np.linspace(0, np.pi*2, 100)
for i_name, name in enumerate(names):
plt.plot(x, np.sin(x)/i_name, label=name, c=cmap(name))
plt.legend()
plt.show()

You can see, that the entries in the legend are duplicated. Solving this is another challenge, see here.
Or use a custom legend instead as explained here.
3. Alternatives
As far the discussion by matplotlib devs goes, they recommend using Seaborn.
See discussion here and an example usage here.