1

I'm looking for a fast way to map scalars to hex colors in python:

import matplotlib
import matplotlib.cm as cm
import matplotlib.colors as mcol

np.random.seed(0) 
df = pd.DataFrame(np.random.rand(20000,1))
df.head()

    0
0   0.548814
1   0.715189
2   0.602763
3   0.544883
4   0.423655

I have 20 colors only, so I wonder if matplotlib is the best solution, or a simple lookup table would be better.

colors = ["#084594", "#0F529E", "#1760A8", "#1F6EB3", "#2979B9", "#3484BE", "#3E8EC4",
                "#4A97C9", "#57A0CE", "#64A9D3", "#73B2D7", "#83BBDB", "#93C4DE", "#A2CBE2",
                "#AED1E6", "#BBD6EB", "#C9DCEF", "#DBE8F4", "#EDF3F9", "#FFFFFF"]
values = df[0].values

@profile
def apply_method(): # 6.9 sec
    cm1 = mcol.ListedColormap(colors)
    norm = matplotlib.colors.Normalize(vmin=np.min(values), vmax=np.max(values), clip=True)
    mapper = cm.ScalarMappable(norm=norm, cmap=cm1)

    return df[0].apply(lambda row: mcol.to_hex(mapper.to_rgba(row)))

%time apply_method()

From the profiler I see that to_rgba() is the most expensive method (6.5 sec for only 20.000 values).

So I'm looking at a way to bypass the to_rgba() method. Is there a way to get the color ranges from cm.ScalarMappable? And then do a lookup to the right hex color?

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
Claudiu Creanga
  • 8,031
  • 10
  • 71
  • 110
  • 1
    What is annoying is that you asked 2 questions and left a comment under a third one for this problem, all without linking between them. It's a hell of work to understand the problem if you don't synchronize your efforts. – ImportanceOfBeingErnest Mar 07 '18 at 22:47
  • @ImportanceOfBeingErnest I was wondering If I should ask another question or update the previous one, but my question moved a bit so updating the question would make the answer there already provided as out of the scope. Thanks anyway. I wasn't sure about how to do the normalisation, but this works fine `v = ((v-v.min())/(v.max()-v.min())*(len(colors)-1))` – Claudiu Creanga Mar 08 '18 at 08:47

1 Answers1

2

The most expensive method in the code from the question is not to_rgba() but the DataFrame.apply because it applies the function to each row individually.

A comparisson between different methods using matplotlib colormaps is given in my answer to this question: How do I map df column values to hex color in one go?

The quintessence is that using a look up table (LUT) is indeed much faster (a factor 400 in the case investigated over there).

However note that in the case of this question here, there is no need to use matplotlib at all. Since you already have a list of possible colors in hex format, there is absolutely no need to use matplotlib and convert hex colors to a colormap and then back to hex colors.

Instead just using the list of colors as look up table (LUT) directly is way faster. Taking a dataframe with 10000 entries (to keep it comarable to the other answer's timings) the code from this question takes 2.7 seconds.

The following code takes 380 µs. This is a factor of 7000 improvement.
Compared to the best method using matplotlib from the linked question's answer of 7.7 ms, it is still a factor of 20 better.

import numpy as np; np.random.seed(0)
import pandas as pd

def create_df(n=10000):
    return pd.DataFrame(np.random.rand(n,1), columns=['some_value'])

def apply(df):
    colors = ["#084594", "#0F529E", "#1760A8", "#1F6EB3", "#2979B9", "#3484BE", "#3E8EC4",
              "#4A97C9", "#57A0CE", "#64A9D3", "#73B2D7", "#83BBDB", "#93C4DE", "#A2CBE2",
              "#AED1E6", "#BBD6EB", "#C9DCEF", "#DBE8F4", "#EDF3F9", "#FFFFFF"]
    colors = np.array(colors)
    v = df['some_value'].values
    v = ((v-v.min())/(v.max()-v.min())*(len(colors)-1)).astype(np.int16)
    return pd.Series(colors[v])

df = create_df()
%timeit apply(df)

# 376 µs
ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712