21

I want to translate the labels of some data to colors for graphing with matplotlib

I have a list of names ["bob", "joe", "andrew", "pete"]

Is there a built in way to map these strings with color values in matplotlib? I thought about randomly creating hex values but I could end up with similar colors or non visible colors.

I've tried a couple different ways of trying to create key values from the below cmap answer:

this:

#names is a list of distinct names
cmap = plt.get_cmap('cool')
colors = cmap(np.linspace(0, 1, len(names)))
clr = {names[i]: colors[i] for i in range(len(names))}
ax.scatter(x, y, z, c=clr)
ford prefect
  • 7,096
  • 11
  • 56
  • 83

4 Answers4

34

Choose a color map, such as viridis:

cmap = plt.get_cmap('viridis')

The colormap, cmap, is a function which can take an array of values from 0 to 1 and map them to RGBA colors. np.linspace(0, 1, len(names)) produces an array of equally spaced numbers from 0 to 1 of length len(names). Thus,

colors = cmap(np.linspace(0, 1, len(names)))

selects equally-spaced colors from the viridis color map.

Note that this is not using the value of the string, it only uses the ordinal position of the string in the list to select a color. Note also that these are not random colors, this is just an easy way to generate unique colors from an arbitrary list of strings.


So:

import numpy as np
import matplotlib.pyplot as plt

cmap = plt.get_cmap('viridis')
names = ["bob", "joe", "andrew", "pete"]
colors = cmap(np.linspace(0, 1, len(names)))
print(colors)
# [[ 0.267004  0.004874  0.329415  1.      ]
#  [ 0.190631  0.407061  0.556089  1.      ]
#  [ 0.20803   0.718701  0.472873  1.      ]
#  [ 0.993248  0.906157  0.143936  1.      ]]

x = np.linspace(0, np.pi*2, 100)
for i, (name, color) in enumerate(zip(names, colors), 1):
    plt.plot(x, np.sin(x)/i, label=name, c=color)
plt.legend()
plt.show()

enter image description here


The problem with

clr = {names[i]: colors[i] for i in range(len(names))}
ax.scatter(x, y, z, c=clr)

is that the c parameter of ax.scatter expects a sequence of RGB(A) values of the same length as x or a single color. clr is a dict, not a sequence. So if colors is the same length as x then you could use

ax.scatter(x, y, z, c=colors)
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • I am trying to input these as the c argument in 3d scatter I tried zip(names, colors) #names is the list of names and colors is the cmap# and I got an error in matplotlib. Is there a way around this – ford prefect Aug 07 '15 at 17:29
  • Could you post the code and the full traceback error message? – unutbu Aug 07 '15 at 17:33
  • I added what I tried to do by turning it into a dictionary – ford prefect Aug 07 '15 at 17:45
  • The `c` parameter of `ax.scatter` expects a sequence (such as a list or array) of RGB(A) values, it does not accept a dict. So if you want a different color for each value of `x,y,z`, then use `c=colors` instead of `c=clr`. – unutbu Aug 07 '15 at 17:47
10

I use the hash function to get numbers between 0 and 1, you can use this even when you don't know all the labels:

x = [1, 2, 3, 4, 5]
labels = ["a", "a", "b", "b", "a"]
y = [1, 2, 3, 4, 5]

colors = [float(hash(s) % 256) / 256 for s in labels]      

plt.scatter(x, y, c=colors, cmap="jet")
plt.show()
pomber
  • 23,132
  • 10
  • 81
  • 94
  • This answer was a great inspiration for me to solve the problem. But one thing that could be wrong with it, is that `hash(s) % 256` could be the same for two different strings in `labels`. – Ufos Nov 25 '19 at 13:42
1

This has upset me so much, that I have written get_cmap_string that returns a function which works exactly as cmap but acts also on strings.

data = ["bob", "joe", "pete", "andrew", "pete"]
cmap = get_cmap_string(palette='viridis', domain=data)
cmap("joe")
# (0.20803, 0.718701, 0.472873, 1.0)
cmap("joe", alpha=0.5)
# (0.20803, 0.718701, 0.472873, 0.5)

1. Implementation

The basic idea as mentioned by all other answers is that we need a hash table -- a mapping from our strings to integers, which is unique. In python this is just a dictionary.

The reason hash(str) won't work, is that even though matplotlib's cmap accepts any integer, it is possible for two different strings to get the same color. For example, if hash(str1) is 8 and hash(str2) is 18, and we initialize cmap as get_cmap(name=palette, lut=10) then cmap(hash(str1)) will be the same as cmap(hash(str2))

Code

import numpy as np
import matplotlib.cm
def get_cmap_string(palette, domain):
    domain_unique = np.unique(domain)
    hash_table = {key: i_str for i_str, key in enumerate(domain_unique)}
    mpl_cmap = matplotlib.cm.get_cmap(palette, lut=len(domain_unique))

    def cmap_out(X, **kwargs):
        return mpl_cmap(hash_table[X], **kwargs)

    return cmap_out

2. Usage

Example as in other answers, but now note that the name pete appears twice.

import matplotlib.pyplot as plt

# data
names = ["bob", "joe", "pete", "andrew", "pete"]

# color map for the data
cmap = get_cmap_string(palette='viridis', domain=names)

# example usage
x = np.linspace(0, np.pi*2, 100)
for i_name, name in enumerate(names):
    plt.plot(x, np.sin(x)/i_name, label=name, c=cmap(name))
plt.legend()
plt.show()

example usage

You can see, that the entries in the legend are duplicated. Solving this is another challenge, see here. Or use a custom legend instead as explained here.

3. Alternatives

As far the discussion by matplotlib devs goes, they recommend using Seaborn. See discussion here and an example usage here.

Ufos
  • 3,083
  • 2
  • 32
  • 36
0

Here's another option:

names = ["bob", "joe", "andrew", "pete"]

colmap = {name: n for n, name in enumerate(set(names))} # <-- uses 'set' to get unique names from list

ax.scatter(
    x, y, z, 
    c = [colmap[name] for name in names],
    cmap = 'tab10' # <-- not necessary but helpful if you want to make sure colors aren't similar
)

It just turns the names into integers, and matplotlib automatically decides how to convert integers into colors. You can ensure colors won't be similar if you use a qualitative colormap, like tab10

Matt
  • 460
  • 6
  • 9