I have a column ("color_values") in my df with some numbers from 1 to 10 and I want to transform those numbers into hex colors with matplotlib.cm (cm)
and matplotlib.colors (mcol)
.
Here I build my pallete:
color_list = ["#084594", ...] # my colors
cm1 = mcol.ListedColormap(color_list)
cnorm = mcol.Normalize(vmin=df["color_values"].min(), vmax=df["color_values"].max())
cpick = cm.ScalarMappable(norm=cnorm, cmap=cm1)
cpick.set_array(np.array([]))
And this is the part that needs to be faster because I have millions of rows:
df["color_hex"] = df.apply(
lambda row: mcol.to_hex(cpick.to_rgba(row["color_values"])), axis=1
)
I'm inserting another column (color_hex) that transforms the value from color_values into hex colors, but it does so by looping through every cell.
I looked at numpy.vectorize
, but in their docs they say The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop
.
I also looked at numpy.where
but that seems more fit when you have a condition to satisfy, which is not my case.
So I was wondering what other numpy operations are fit for this?