3

I have a column ("color_values") in my df with some numbers from 1 to 10 and I want to transform those numbers into hex colors with matplotlib.cm (cm) and matplotlib.colors (mcol).

Here I build my pallete:

color_list = ["#084594", ...] # my colors
cm1 = mcol.ListedColormap(color_list)
cnorm = mcol.Normalize(vmin=df["color_values"].min(), vmax=df["color_values"].max())
cpick = cm.ScalarMappable(norm=cnorm, cmap=cm1)
cpick.set_array(np.array([]))

And this is the part that needs to be faster because I have millions of rows:

df["color_hex"] = df.apply(
            lambda row: mcol.to_hex(cpick.to_rgba(row["color_values"])), axis=1
    )

I'm inserting another column (color_hex) that transforms the value from color_values into hex colors, but it does so by looping through every cell.

I looked at numpy.vectorize, but in their docs they say The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

I also looked at numpy.where but that seems more fit when you have a condition to satisfy, which is not my case.

So I was wondering what other numpy operations are fit for this?

jpp
  • 159,742
  • 34
  • 281
  • 339
Claudiu Creanga
  • 8,031
  • 10
  • 71
  • 110
  • 1
    The actual problem is solved in [this question](https://stackoverflow.com/questions/49156484/fast-way-to-map-scalars-to-colors-in-python). If the unnecessary use of matplotlib is nonetheless desired, check the `apply2` case from [this answer](https://stackoverflow.com/a/47398328/4124317), which uses `numpy.apply_along_axis`. – ImportanceOfBeingErnest Mar 07 '18 at 22:36

1 Answers1

5

There are 2 ways that may improve performance. Without data it is difficult to confirm whether this is indeed the case.

1. Use pd.Series.apply instead of pd.DataFrame.apply

df['color_hex'] = df['color_values'].apply(lambda x: mcol.to_hex(cpick.to_rgba(x)))

This reduces the amount of structured data that needs to be passed through a loop.

2. Use a list comprehension

df['color_hex'] = [mcol.to_hex(cpick.to_rgba(x)) for x in df['color_values']]

This works because a list can be assigned directly to a pd.Series.

jpp
  • 159,742
  • 34
  • 281
  • 339