Use NumPy to apply a fixed palette to an image?

Question

I have a NumPy image in RGB bytes, let's say it's this 2x3 image:

img = np.array([[[  0, 255,   0], [255, 255, 255]],
                [[255,   0, 255], [  0, 255, 255]],
                [[255,   0, 255], [  0,   0,   0]]])

I also have a palette that covers every color used in the image. Let's say it's this palette:

palette = np.array([[255,   0, 255],
                    [  0, 255,   0],
                    [  0, 255, 255],
                    [  0,   0,   0],
                    [255, 255, 255]])

Is there some combination of indexing the image against the palette (or vice versa) that will give me a paletted image equivalent to this?

img_p = np.array([[1, 4],
                  [0, 2],
                  [0, 3]])

For comparison, I know the reverse is pretty simple. palette[img_p] will give a result equivalent to img. I'm trying to figure out if there's a similar approach in the opposite direction that will let NumPy do all the heavy lifting.

I know I can just iterate over all the image pixels individually and build my own paletted image. I'm hoping there's a more elegant option.

Okay, so I implemented the various solutions below and ran them over a moderate test set: 20 images, each one 2000x2000 pixels, with a 32-element palette of three-byte colors. Pixels were given random palette indexes. All algorithms were run over the same images.

Timing results:

mostly empty lookup array - 0.89 seconds
np.searchsorted approach - 3.20 seconds
Pandas lookup, single integer - 38.7 seconds
Using == and then aggregating the boolean results - 66.4 seconds
inverting the palette into a dict and using np.apply_along_axis() - Probably ~500 seconds, based on a smaller test set
Pandas lookup with a MultiIndex - Probably ~3000 seconds, based on a smaller test set

Given that the lookup array has a significant memory penalty (and a prohibitive one if there's an alpha channel), I'm going to go with the np.searchsorted approach. The lookup array is significantly faster if you want to spend the RAM on it.

`palette, img_p = np.unique(img.reshape(-1,3), axis=0, return_inverse=True)` then `img_p.reshape(*img.shape[:2])`, IIUC, but with a different ordering/ *palette*. — Michael Szczesny, Apr 28 '22 at 21:04
I expanded the example to a 3×2 image and tweaked the example palette to hopefully avoid overfitting of solutions. — asciiphil, Apr 28 '22 at 21:17
Unfortunately, I need to end up with the indices from the original palette. @MichaelSzczesny's suggestion looks like it would work if I just needed *a* palette. — asciiphil, Apr 28 '22 at 21:20
`(img.reshape(-1,3) == palette[:,None]).all(2).argmax(0).reshape(*img.shape[:2])` — Michael Szczesny, Apr 28 '22 at 21:29
@MichaelSzczesny: that's it, nicely done! (I verified against my slow version). — Pierre D, Apr 28 '22 at 21:33
@PierreD - Can you run a benchmark against your solution? I suspect python can beat numpy in this case. I don't have `isqrt` in python 3.7.5 for the larger sample data. — Michael Szczesny, Apr 28 '22 at 21:37
@MichaelSzczesny, hold on, good intuition, but I missed the part that it does so many comparisons. At `w, h = 100, 100`, the timings cross over and this becomes slower than the dict lookup. Note that you can simply use `int(np.sqrt(w * h))` instead of `isqrt()`. The latter is more precise for large values, but here we don't care. — Pierre D, Apr 28 '22 at 21:40
I think we could still do better than `searchsorted`, and without quite as much of a memory penalty as `large_array` by using a [perfect hash](https://en.wikipedia.org/wiki/Perfect_hash_function). I don't have time now to look into this, but it would be inspired by [Hash, displace, and compress](http://cmph.sourceforge.net/papers/esa09.pdf), although simpler and less "minimal". I may try something over the weekend. — Pierre D, Apr 29 '22 at 12:57
also, it is interesting that you are using the reverse lookup on multiple images for the same palette. Most approaches here can be sped up by exploiting taking this into account. — Pierre D, Apr 29 '22 at 13:01
Indeed. In my testing, I set up any fixed data (like `M`, `p1d`, and `ix` in the `searchsorted` approach) outside the loop that checked the images. — asciiphil, Apr 29 '22 at 13:06
If you `np.dot()` your image and your palette with `[1, 256, 65536]` it is very fast and everything becomes a single 24-bit number, rather than a massive, sparse 3-d lookup table https://stackoverflow.com/a/66624499/2836621 — Mark Setchell, Apr 30 '22 at 02:58
@MarkSetchell That's basically what the `searchsorted` approach does. Even if you convert the three- (or four-) value colors into single integers, you still need a way to map those integers to palette indices. If you use a 0..2^24 1D array of uint8 values, that's exactly the same size as a 3×0..2^8 3D array of uint8 values. — asciiphil, May 01 '22 at 18:15

Pierre D · Accepted Answer · 2022-04-28T21:59:51.730

Edit Here is a faster way that uses np.searchsorted.

def rev_lookup_by_sort(img, palette):
    M = (1 + palette.max())**np.arange(3)
    p1d, ix = np.unique(palette @ M, return_index=True)
    return ix[np.searchsorted(p1d, img @ M)]

Correctness (by equivalence to rev_lookup_by_dict() in the original answer below):

np.array_equal(
    rev_lookup_by_sort(img, palette),
    rev_lookup_by_dict(img, palette),
)

Speedup (for a 1000 x 1000 image and a 1000 colors palette):

orig = %timeit -o rev_lookup_by_dict(img, palette)
# 2.47 s ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

v2 = %timeit -o rev_lookup_by_sort(img, palette)
# 71.8 ms ± 93.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> orig.average / v2.average
34.46

So that answer using np.searchsorted is 30x faster at that size.

Original answer

An initial shot gives a slowish version (hopefully we can do better). It uses a dict, where keys are colors as tuples.

def rev_lookup_by_dict(img, palette):
    d = {tuple(v): k for k, v in enumerate(palette)}
    def func(pix):
        return d.get(tuple(pix), -1)
    return np.apply_along_axis(func, -1, img)

img_p = rev_lookup_by_dict(img, palette)

Notice that "color not found" is expressed as -1 in img_p.

On your (modified) data:

>>> img_p
array([[1, 4],
       [0, 2],
       [0, 3]])

Larger example:

# setup
from math import isqrt

w, h = 1000, 1000
s = isqrt(w * h)
palette = np.random.randint(0, 256, (s, 3))
img = palette[np.random.randint(0, s, (w, h))]

Test:

img_p = rev_lookup_by_dict(img, palette)

>>> np.array_equal(palette[img_p], img)
True

Timing:

%timeit rev_lookup_by_dict(img, palette)
# 2.48 s ± 16.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

That's quite awful, but hopefully we can do better.

That is an interesting use of `np.unique`. I think if I were doing the same, I would have done it in two passes, with `np.argsort` then `np.sort`. That's going in my toolbox. — asciiphil, Apr 29 '22 at 03:24

Michael Szczesny · Answer 2 · 2022-04-28T22:10:21.527

1

Faster than a dictionary, but with a 64 MB lookup array.

d = np.zeros((256,256,256), np.int32)  # 64 MB!
d[tuple(palette.T)] = np.arange(len(palette))
img_p = d[tuple(img.reshape(-1,3).T)].reshape(*img.shape[:2])
# %%timeit 10 loops, best of 5: 25.8 ms per loop (1000 x 1000)

np.testing.assert_equal(img, palette[img_p])

edited Apr 28 '22 at 22:10

answered Apr 28 '22 at 22:03

Michael Szczesny

4,911
5
15
32

Ah, yes, I too thought of this initially, but I discarded the thought as it would scale badly if adding an alpha component. Still good to see in the repertoire of solutions... – Pierre D Apr 28 '22 at 22:12
1

BTW, I get 72 ms using `searchsorted`. And on my machine, this solution takes 11.5 ms! – Pierre D Apr 28 '22 at 22:13
You can probably save a little space with the dtype of `d`. If the palette is 256 elements or fewer (and 256 is the max for PNG), you can drop it to "only" 16 MB with `np.uint8`. As @PierreD notes, an alpha channel balloons this approach to 4 GB. – asciiphil Apr 29 '22 at 01:34

asciiphil · Answer 3 · 2022-04-29T02:40:59.843

If you can use Pandas in addition to NumPy, you can use a Pandas MultiIndex as a sort of sparse array:

inverse_palette = pd.Series(np.arange(len(palette)),
                            index=pd.MultiIndex.from_arrays(palette.T)).sort_index()
img_p = np.apply_along_axis(lambda px: inverse_palette[tuple(px)], 2, img)

That's really slow, though. You can do a bit better by converting the colors into integers first:

def collapse_bytes(array):
    result = np.zeros(array.shape[:-1], np.uint32)
    for i in range(array.shape[-1]):
        result = result * 256 + array[...,i]
    return result

inverse_palette = pd.Series(np.arange(len(palette)),
                            index=collapse_bytes(palette)).sort_index()
img_p = inverse_palette[collapse_bytes(img).flat].to_numpy()\
                                                 .reshape(img.shape[:-1])

Use NumPy to apply a fixed palette to an image?

3 Answers3

Linked

Related