2

I've attempting to convert a long list of RGB values in a dataframe into Hex to allow some chart building, I've managed to locate the right code to do the conversion, it is just applying it that is killing me.

df = pd.DataFrame({'R':[152,186,86], 'G':[112,191,121], 'B':[85,222,180] })

def rgb_to_hex(red, green, blue):
    """Return color as #rrggbb for the given color values."""
    return '#%02x%02x%02x' % (red, green, blue)

With this code being the one bugging out:

df['hex'] = rgb_to_hex(df['R'],df['G'],df['B'])

with the below error:

TypeError: %x format: an integer is required, not Series

Any thoughts?

benvc
  • 14,448
  • 4
  • 33
  • 54
Jake Bourne
  • 723
  • 3
  • 10
  • 27
  • `'#%02x%02x%02x' % (red, green, blue)` expects scalars, no sequences. You need to either `apply` this function to each row in the dataframe, or write your own loop. Alternatively, matplotlib `colors` API probably has a function that will operate on sequences – Paul H Dec 20 '18 at 21:05
  • This should likely be a duplicate of [Create new column based on values from other columns / apply a function of multiple columns, row-wise in Pandas](https://stackoverflow.com/questions/26886653). – Karl Knechtel Jan 07 '23 at 20:57

4 Answers4

5

The % operator can't work with sequences the way you'd want it to. Instead, you should use the .apply method of the dataframe to pass each row individually to your function:

df['hex'] = df.apply(lambda r: rgb_to_hex(*r), axis=1)

     R    G    B      hex
0  152  112   85  #987055
1  186  191  222  #babfde
2   86  121  180  #5679b4

Rather than assigning the column in-place, I recommend using the .assign method to return a different dataframe, just to keep things "pure" in the functional programming sense:

df2 = df.assign(hex=df.apply(lambda r: rgb_to_hex(*r), axis=1))
Paul H
  • 65,268
  • 20
  • 159
  • 136
0

EDIT: See the answer from @PaulH for a better approach.

You need to perform your conversion for each of the rows in your DataFrame (or access a specific row) in order to generate the hex values you are looking for. For example:

import pandas as pd

df = pd.DataFrame({'R':[152,186,86], 'G':[112,191,121], 'B':[85,222,180]})

for i, r, g, b in df.itertuples():
    print('#{:02x}{:02x}{:02x}'.format(r, g, b))

# OUTPUT
# #987055
# #babfde
# #5679b4
benvc
  • 14,448
  • 4
  • 33
  • 54
  • 1
    you typically don't need or want to iterate over a dataframe. Its `.apply` and `.map` methods would be more appropriate. – Paul H Dec 20 '18 at 21:16
0

List comprehension will work:

df = pd.DataFrame({'R':[152,186,86], 'G':[112,191,121], 'B':[85,222,180] })

['#%02x%02x%02x' % (df['R'][i], df['G'][i], df['B'][i]) for i in range(len(df))]

Out

['#987055', '#babfde', '#5679b4']

If you want the row index:

[(df.index[i], '#%02x%02x%02x' % (df['R'][i], df['G'][i], df['B'][i])) for i in range(len(df))]

#out
[(0, '#987055'), (1, '#babfde'), (2, '#5679b4')]
It_is_Chris
  • 13,504
  • 2
  • 23
  • 41
  • 1
    you typically don't need or want to iterate over a dataframe. Its `.apply` and `.map` methods would be more appropriate. – Paul H Dec 20 '18 at 21:16
0

TL;DR: For < 100 rows, use apply() as Paul demonstrates; For > 100 rows, use my method 2 below.


Method 1: Use vector operations to convert the R, G, B integers to hex numbers digit-by-digit

Instead of iterating or apply, we can use Pandas' vectorized methods.

We can leverage the fact a R/G/B value is in the range 0-255 and will only be two hex digits

dec_to_hex_digits = pd.DataFrame([f"{x:x}" for x in range(16)], columns=["hex"], dtype="string")

def int_to_hex(c: pd.Series) -> pd.Series:
    d0 = c % 16              # Get value of 16**0 place digit
    d1 = (c - d0) // 16      # Get value of 16**1 place digit

    # Convert the integers to hex digits
    d0_h = dec_to_hex_digits.loc[d0, "hex"]
    d1_h = dec_to_hex_digits.loc[d1, "hex"]
    
    # Reindex to original indices
    d0_h.index = c.index
    d1_h.index = c.index

    # Concatenate digits and return
    return d1_h.str.cat(d0_h) 

Then, we can just concatenate the red, green, and blue hex values:

def rgb_to_hex_vec(red: pd.Series, green: pd.Series, blue: pd.Series) -> pd.Series:
    return int_to_hex(red) + int_to_hex(green) + int_to_hex(blue)

Testing:

df = pd.DataFrame({'R':[152,186,86], 'G':[112,191,121], 'B':[85,222,180] })
df["rgb"] = "#" + rgb_to_hex(df["R"], df["G"], df["B"])

gives:

     R    G    B      rgb
0  152  112   85  #987055
1  186  191  222  #babfde
2   86  121  180  #5679b4

Method 2: Use a lookup table containing numbers from 0 to 255

Since we only have 256 numbers, a lookup table isn't that enormous. It's easy enough to create just like the single-digit hex lookup table from method 1 above.

# Note the format string pads the hex numbers to 2 digits
dec_to_hex_nums = pd.DataFrame({"hex": [f"{x:02x}" for x in range(256)]}, dtype=str)

def rgb_to_hex_lookup(red: pd.Series, green: pd.Series, blue: pd.Series) -> pd.Series:
    # Look everything up
    rr = dec_to_hex_nums.loc[red, "hex"]
    gg = dec_to_hex_nums.loc[green, "hex"]
    bb = dec_to_hex_nums.loc[blue, "hex"]
    # Reindex
    rr.index = red.index
    gg.index = green.index
    bb.index = blue.index
    # Concatenate and return
    return rr + gg + bb

Benchmarks

Timing these approaches and comparing against the .apply method, I get the following results:

enter image description here

For smaller dataframes (< 100 rows), .apply is significantly (~5x) faster than the first vectorized method, and ~2x faster than the lookup approach. This is unsurprising -- the overhead in creating all those extra Series objects adds up, so if you have a small number of elements you might as well use apply, which operates one row at a time. .apply remains faster than the vectorized approach up to ~1000 rows.

As the size of the dataframe increases (> 1k rows), the vectorized approach is consistently ~4x faster than apply, and the lookup approach is more than an order of magnitude faster.

Conclusion: Since we only have 256 possible values for each primary color, it is significantly_ faster to just use a lookup table to convert the decimal value to hex, and concatenate them, for dataframes of significant size (> 100 rows). For smaller dataframes, .apply your rgb_to_hex function to each row.


Appendix: Timing and plotting code

import timeit
import numpy as np
from matplotlib import pyplot as plt

#%% Define extra functions
def rgb_to_hex(red, green, blue):
    """Return color as #rrggbb for the given color values."""
    return '#%02x%02x%02x' % (red, green, blue)

def func_vec(df):
    return "#" + rgb_to_hex_vec(df["R"], df["G"], df["B"])

def func_apply(df):
    return df.apply(lambda r: rgb_to_hex(*r), axis=1)

def func_lookup(df):
    return "#" + rgb_to_hex_lookup(df["R"], df["G"], df["B"])

#%% Set up timing control
funcs = [func_vec, func_apply, func_lookup]

N = 10
sizes = [1, 10, 100, 500, 1000, 5000, 10_000, 50_000, 100_000, 500_000, 1_000_000]
times = np.zeros((len(sizes), len(funcs)))

#%% Run funcs and time
for i, s in enumerate(sizes):
    df = pd.DataFrame({c: np.random.randint(0, 256, (s,)) for c in "RGB"})
    for j, f in enumerate(funcs):
        times[i, j] = timeit.timeit("f(df)", globals=globals(), number=N) / N
        print(f"{i}\t{j}\t{times[i, j]}")

#%% Plot
fig, ax = plt.subplots()
for j, f in enumerate(funcs):
    ax.plot(sizes, times[:, j], label=f.__name__)

ax.set_xlabel("Dataframe size")
ax.set_ylabel("Time per run (s)")
ax.set_xscale("log")
ax.set_yscale("log")
ax.grid()
ax.legend()
fig.tight_layout()

My times array is:

array([[6.10225860e-03, 7.82125700e-04, 1.85535800e-03],
       [6.61873300e-03, 8.52286000e-04, 1.84020970e-03],
       [6.78867620e-03, 1.54506480e-03, 1.90147520e-03],
       [8.48029180e-03, 5.30038540e-03, 2.08237630e-03],
       [7.84849770e-03, 9.26545150e-03, 2.16718480e-03],
       [1.59062129e-02, 4.11019214e-02, 3.62331420e-03],
       [2.34676019e-02, 7.93717382e-02, 6.79325230e-03],
       [9.42278390e-02, 3.89326700e-01, 2.62593851e-02],
       [1.81760045e-01, 7.82542864e-01, 5.41926950e-02],
       [9.42876875e-01, 3.99834750e+00, 2.60429247e-01],
       [2.01807942e+00, 8.62860848e+00, 5.02780442e-01]])
Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70