TL;DR: For < 100 rows, use apply()
as Paul demonstrates; For > 100 rows, use my method 2 below.
Method 1: Use vector operations to convert the R, G, B integers to hex numbers digit-by-digit
Instead of iterating or apply
, we can use Pandas' vectorized methods.
We can leverage the fact a R/G/B value is in the range 0-255 and will only be two hex digits
dec_to_hex_digits = pd.DataFrame([f"{x:x}" for x in range(16)], columns=["hex"], dtype="string")
def int_to_hex(c: pd.Series) -> pd.Series:
d0 = c % 16 # Get value of 16**0 place digit
d1 = (c - d0) // 16 # Get value of 16**1 place digit
# Convert the integers to hex digits
d0_h = dec_to_hex_digits.loc[d0, "hex"]
d1_h = dec_to_hex_digits.loc[d1, "hex"]
# Reindex to original indices
d0_h.index = c.index
d1_h.index = c.index
# Concatenate digits and return
return d1_h.str.cat(d0_h)
Then, we can just concatenate the red, green, and blue hex values:
def rgb_to_hex_vec(red: pd.Series, green: pd.Series, blue: pd.Series) -> pd.Series:
return int_to_hex(red) + int_to_hex(green) + int_to_hex(blue)
Testing:
df = pd.DataFrame({'R':[152,186,86], 'G':[112,191,121], 'B':[85,222,180] })
df["rgb"] = "#" + rgb_to_hex(df["R"], df["G"], df["B"])
gives:
R G B rgb
0 152 112 85 #987055
1 186 191 222 #babfde
2 86 121 180 #5679b4
Method 2: Use a lookup table containing numbers from 0 to 255
Since we only have 256 numbers, a lookup table isn't that enormous. It's easy enough to create just like the single-digit hex lookup table from method 1 above.
# Note the format string pads the hex numbers to 2 digits
dec_to_hex_nums = pd.DataFrame({"hex": [f"{x:02x}" for x in range(256)]}, dtype=str)
def rgb_to_hex_lookup(red: pd.Series, green: pd.Series, blue: pd.Series) -> pd.Series:
# Look everything up
rr = dec_to_hex_nums.loc[red, "hex"]
gg = dec_to_hex_nums.loc[green, "hex"]
bb = dec_to_hex_nums.loc[blue, "hex"]
# Reindex
rr.index = red.index
gg.index = green.index
bb.index = blue.index
# Concatenate and return
return rr + gg + bb
Benchmarks
Timing these approaches and comparing against the .apply
method, I get the following results:

For smaller dataframes (< 100 rows), .apply
is significantly (~5x) faster than the first vectorized method, and ~2x faster than the lookup approach. This is unsurprising -- the overhead in creating all those extra Series objects adds up, so if you have a small number of elements you might as well use apply
, which operates one row at a time. .apply
remains faster than the vectorized approach up to ~1000 rows.
As the size of the dataframe increases (> 1k rows), the vectorized approach is consistently ~4x faster than apply
, and the lookup approach is more than an order of magnitude faster.
Conclusion: Since we only have 256 possible values for each primary color, it is significantly_ faster to just use a lookup table to convert the decimal value to hex, and concatenate them, for dataframes of significant size (> 100 rows). For smaller dataframes, .apply
your rgb_to_hex
function to each row.
Appendix: Timing and plotting code
import timeit
import numpy as np
from matplotlib import pyplot as plt
#%% Define extra functions
def rgb_to_hex(red, green, blue):
"""Return color as #rrggbb for the given color values."""
return '#%02x%02x%02x' % (red, green, blue)
def func_vec(df):
return "#" + rgb_to_hex_vec(df["R"], df["G"], df["B"])
def func_apply(df):
return df.apply(lambda r: rgb_to_hex(*r), axis=1)
def func_lookup(df):
return "#" + rgb_to_hex_lookup(df["R"], df["G"], df["B"])
#%% Set up timing control
funcs = [func_vec, func_apply, func_lookup]
N = 10
sizes = [1, 10, 100, 500, 1000, 5000, 10_000, 50_000, 100_000, 500_000, 1_000_000]
times = np.zeros((len(sizes), len(funcs)))
#%% Run funcs and time
for i, s in enumerate(sizes):
df = pd.DataFrame({c: np.random.randint(0, 256, (s,)) for c in "RGB"})
for j, f in enumerate(funcs):
times[i, j] = timeit.timeit("f(df)", globals=globals(), number=N) / N
print(f"{i}\t{j}\t{times[i, j]}")
#%% Plot
fig, ax = plt.subplots()
for j, f in enumerate(funcs):
ax.plot(sizes, times[:, j], label=f.__name__)
ax.set_xlabel("Dataframe size")
ax.set_ylabel("Time per run (s)")
ax.set_xscale("log")
ax.set_yscale("log")
ax.grid()
ax.legend()
fig.tight_layout()
My times
array is:
array([[6.10225860e-03, 7.82125700e-04, 1.85535800e-03],
[6.61873300e-03, 8.52286000e-04, 1.84020970e-03],
[6.78867620e-03, 1.54506480e-03, 1.90147520e-03],
[8.48029180e-03, 5.30038540e-03, 2.08237630e-03],
[7.84849770e-03, 9.26545150e-03, 2.16718480e-03],
[1.59062129e-02, 4.11019214e-02, 3.62331420e-03],
[2.34676019e-02, 7.93717382e-02, 6.79325230e-03],
[9.42278390e-02, 3.89326700e-01, 2.62593851e-02],
[1.81760045e-01, 7.82542864e-01, 5.41926950e-02],
[9.42876875e-01, 3.99834750e+00, 2.60429247e-01],
[2.01807942e+00, 8.62860848e+00, 5.02780442e-01]])