2

I have a pandas.dataframe that looks like this:

columns    0    1   2   3   4   5
           A    A   A   A   B   B
           B    B   B   C   C   D
           D    D   E   E   F   F

I want to plot this using pyplot.imshow(), specifying the following colormap:

color_dict = {
    "A": "#DA291E",
    "B": "#83DF39",
    "C": "#E8132d",
    "D": "#008933",
    "E": "#006CB3",
    "F": "#52BFEC"
}

If I was plotting a bar or a scatter I could just call with the argumentcolor=a_list_of_colors but this doesn't work with imshow().

Instead I need to call with a cmap but as far as I understand it isn´t possible to create a cmap where a specific color is mapped to a value.

This means I need to create a colormap like this:

    from matplotlib.colors import ListedColormap 

    _colors = ["#DA291E", "DA291E", "DA291E", "DA291E" 
               "#83DF39", "#83DF39", "#83DF39", "#83DF39", "#83DF39", #...and so on]
    cmap = ListedColormap(_colors, name="custom_cmap")

But is there a better way to go about this?


I thought I could implement above method but for some reason it doesn't work and I can't seem to figure out why.

I begin by creating a color_list based on a long series version of my df above and then convert that list to a colormap:

color_list = list(series.map(color_dict))
custom_cmap = ListedColormap(color_list, name="custom_cmap")

The long series basically looks like this:

A
A
A
A
B
B
B
B
B
C
#...and so on

The fifth element in my df is Band when I print custom_cmap.__dict__.colors[4] I get #83DF39 which corresponds with the string value B in my df. So the mapping is correct.

The problem occurs when I call plt.imshow() with cmap=custom_cmap as it doesn´t follow the cmap - some values get the wrong color.

My first thought was that I had messed up the order meaning that the color_list didn´t follow the order of the df but it does.

The df above contains 18 values and the color_list does too. The last value in the df is an F which means that the last color in the color_list should be #52BFEC, which it is.


Adding more code.

# Begin by converting strings to any number since plt.imshow() needs numbers
float_dict = {
    'A': 0.0,
    'B': 1.0,
    'C': 2.0,
    'D': 3.0,
    'E': 4.0,
    'F': 5.0,
    'G': 6.0,
    'H': 7.0,
    'I': 8.0
}

converted_series = series.map(float_dict).copy()

# Map each float to a specific color
color_dict = {
    0.0: '#DA291E',
    1.0: '#E7112d',
    2.0: '#83CD39',
    3.0: '#009934',
    4.0: '#007AB3',
    5.0: '#54BDEC',
    6.0: '#000066',
    7.0: '#DDDD11',
    8.0: '#572B84',
}

# Create a cmap from a color list
color_list = list(converted_series.map(color_dict))
custom_cmap = ListedColormap(color_list, name="custom_cmap")

# Widen the series into a df
df = series_to_wide_df(converted_series, n_columns=8)

# Plot it
plt.imshow(df, cmap=custom_cmap, interpolation='none')

The result of above is seen in image below.

enter image description here

  • Note that the data in this image is not the same the data in the df in the original post.

I tested a different color_dict:

color_dict = {
    0.0: '#FF0000',
    1.0: '#FF0000',
    2.0: '#FF0000',
    3.0: '#FF0000',
    4.0: '#FF0000',
    5.0: '#000000',
    6.0: '#000000',
    7.0: '#000000',
    8.0: '#000000'
}

But the colors still don't map correctly. With these colors, 1.0, 2.0, 6.0, 7.0 and some 8.0 get the color red.

user3471881
  • 2,614
  • 3
  • 18
  • 34
  • Possible duplicate of [python imshow, set certain value to defined color](https://stackoverflow.com/questions/37719304/python-imshow-set-certain-value-to-defined-color) – Alexander McFarlane Sep 11 '18 at 20:52
  • No, this seems to be exactly the way you would tackle the problem. Do you need further help with implementation, or is this clear and you just wanted to ask for possible alternatives? – ImportanceOfBeingErnest Sep 11 '18 at 22:33
  • @AlexanderMcFarlane - I don´t think this is a duplicate because that question deals with setting one value to a specific color, this deals with mapping multiple values to specific colors using a custom colormap. – user3471881 Sep 12 '18 at 10:33
  • @ImportanceOfBeingErnest - the implementation is clear, I just wanted to ask for possible alternatives. What is customary - answer the question myself or just remove it? – user3471881 Sep 12 '18 at 10:33
  • An alternative would be to convert the colors to rgb tuples and construct a 3 channel image from those. I do think the colormap approach is better. Also, I'm not convinced the link above is really a duplicate. So if you want you may answer this question, as a complete code for this approach may be useful for others to see. – ImportanceOfBeingErnest Sep 12 '18 at 12:05
  • Added some more code because I encountered unforseen issues implementing this after all. Any help is appreciated, @ImportanceOfBeingErnest. – user3471881 Sep 13 '18 at 10:09

1 Answers1

3

It's rather hard to see where the proposed code goes wrong without it being runnable by itself.

The following would create a dictionary mapping letters to numbers and apply it to the dataframe. Then it'll create a colormap with as many colors as there are (possible) values in the dataframe. Then plotting with imshow works fine when the colormap is normalized between zero and the number of elements in the colormap. (This normalization may just be useful if not all possible values actually occur in the specific dataframe to plot, e.g. in the case letters A and H would be missing.)

import numpy as np; np.random.seed(42)
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap


df = pd.DataFrame(np.random.choice(list("ABCDEFGH"), size=(8,8)))
print(df)

#mapping from letters to numbers
letter2num = dict(zip(list("ABCDEFGH"), np.arange(8)))
df2 = pd.DataFrame(np.array( [letter2num[i] for i in df.values.flat] ).reshape(df.shape))


#produce colormap with as many colors as there are unique values in df
colors = ["pink", "red", "violet", "blue", 
          "turquoise", "limegreen", "gold", "brown"]  # use hex colors here, if desired.
cmap = ListedColormap(colors)

fig, ax = plt.subplots()
ax.imshow(df2.values, vmin=0, vmax=len(cmap.colors), cmap=cmap)


for i in range(len(df2)):
    for j in range(len(df2.columns)):
        ax.text(j,i, df.values[i,j], ha="center", va="center")
plt.show()

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • Ahhhhh. So I can't use `ListedColormap()` to create a "sequence" of `colors` like I thought. This solved everything. With your data above I was basically doing: `color_list = list(np.array([number2color[i] for i in df2.values.flat]))` where `letter2color` is a `dict` where each `key` (number in this case) has a color. I then used `ListedColormap(color_list)`. – user3471881 Sep 14 '18 at 08:30