Finding closest matching color from an array

Question

I'm working on implementing a terminal renderer, and after quantizing the original image down to 256 colors, I need to find the closest representation for each pixel of the image.

I was looking at doing this by comparing the squared distances, but the naive way would require me to store 16 bit integers in a vector.

So, the meat of the question is: can I achieve an approximate result using only 8 bit vectors by calculating ((a - b) ** 2 & 0xff00) >> 8. I know of a way to calculate the absolute difference of bytes, as highlighted here (https://stackoverflow.com/a/46970204/12954733), and I'm only missing is a _mm256_mulhi_epu8 instruction, sadly it doesn't exist in both avx2 and avx512, only 16bit and higher versions exist.

This is a plain C version of what I'm trying to achieve:

struct col {
    uint8_t r,g,b;
};

struct col cols[256];

int find_i(struct col x) {
    int mini = INT_MAX;
    int mind = INT_MAX;
    for (int i = 0; i < 256; i++) {
        int dr, db, dg;
        dr = x.r - cols[i].r;
        dg = x.g - cols[i].g;
        db = x.b - cols[i].b;
        int32_t d = dr*dr + dg*dg + db*db;
        if (d < mind) {
            mind = d;
            mini = i;
        }
    }
    return mini;
}

*I need to find the closest representation for each pixel of the image*... It is unclear what you are trying to achieve. Can you first implement your function with portable C code and only worry about optimizing using SIMD once you have a working version? — chqrlie, Mar 05 '23 at 16:42
since each color component (r,g,b) is an unsigned byte, in the worst case, a squared absolute difference of 2 colors is 255*255, which needs 2 bytes to be stored. Later, the per-channel squared differences will be added, and the one with the lowest value will be chosen as the best representation — Cloud11665, Mar 05 '23 at 16:49
This RGB euclidian distance does not produce the best visual results. It would be better to first convert the pixel color to HSV representation, but it is even more costly. — chqrlie, Mar 05 '23 at 17:58
Efficient palette quantization methods use a normalised palette, typically 6x6x6=216 colors with linear components to which the pixel value can be projected by dividing the components into 6 ranges using a lookup table. Grey pixels with identical `r`, `g` and `b` values could be projected to a separate grey scale subpalette with 24 steps. You can use a 256 color palette with 16 standard color, 216 colors for the rgb cube and 24 grey scales (or a little more if you take advantage of the redundancy between the 3 parts). — chqrlie, Mar 05 '23 at 18:01
For smoother output, you should diffuse the quantization errors to the adjacent pixels or use a [dithering pattern ](https://en.wikipedia.org/wiki/Dither) to obtain intermediary colors as a visual effect. — chqrlie, Mar 05 '23 at 18:06
I know of dithering, but sadly can't use it. Rendering images to a terminal by converting it to ansi escapes is expensive in in of itself, and there are methods to do rows of consecutive, same colored "glyphs" in the same time as a single one. — Cloud11665, Mar 05 '23 at 18:16
You can download and compile [QuickEmacs](https://github.com/qemacs/qemacs). The terminal version displays various image formats to the terminal by subscaling and mapping pixels to colored characters. You can also try the `mandelbrot-test` function on `C-h m` to navigate the fractal zoom in ascii mode for fun. — chqrlie, Mar 05 '23 at 18:36
Have you looked at existing ASCII color libraries like http://caca.zoy.org/wiki/libcaca or AAlib? https://aa-project.sourceforge.net/aalib/ ? caca mentions dithering as a feature, and is fast enough (on modern CPUs) for video. So is MPV's `tct` (true color text) with `--vo-tct-256=yes` to actually only use 256 colors, not true color. MPV's `sixel` video output also has multiple dither options. Oh, that's a library: https://github.com/saitoha/libsixel so presumably you can look at its dithering code. — Peter Cordes, Mar 05 '23 at 20:56
Any CPU with AVX2 available should be plenty fast unless you need to keep CPU time very low or work with very high resolutions. You might widen elements to 16-bit so you can use `pmaddwd` to multiply-accumulate between adjacent 16-bit elements, if you can arrange your data so that's good. Since it was originally 8-bit data you know it won't have overflowed the low 16 bits of the 32-bit element in case you want to use saturating 16-bit adds like `vpaddusw` so you can pack back down to denser 16-bit elements? (The good candidates won't saturate). — Peter Cordes, Mar 05 '23 at 21:00

score 3 · Answer 1 · answered Mar 05 '23 at 18:11

Your goal is to draw an image on a terminal without full color capabilities.

This RGB euclidian distance does not produce the best visual results. It would be better to first convert the pixel color to HSV representation, but it is even more costly.

If you can select the color palette (for example if producing a GIF file), Efficient palette quantization methods use a normalised palette, typically 6x6x6=216 colors with linear components to which the pixel value can be projected by dividing the components into 6 ranges using a lookup table. Grey pixels with identical r, g and b values could be projected to a separate grey scale sub-palette with 24 steps. You can use a 256 color palette with 16 standard color, 216 colors for the rgb cube and 24 grey scales (or a little more if you take advantage of the redundancy between the 3 parts).

For smoother output, you should diffuse the quantization errors to the adjacent pixels, eg: Floyd-Steinberg, or use a more regular dithering pattern to obtain intermediary colors as a visual effect.

Good point about using an alternate color space. I'm more familiar with YUV/YCbCr color space [since that's used more in video]. Since the human eye is most sensitive to changes in luminance, I'd compare Y first, then U/V. For HSV, is the V part an analog of the luminance? And, calculating YCbCr from RGB seems to be less computation than HSV/HSL. — Craig Estey, Mar 05 '23 at 19:33

score 2 · Answer 2 · answered Mar 05 '23 at 20:24

Images have a lot of pixels, but because you only have 256 colors of output, the precision is not that important. Here’s an approximate faster method.

Allocate a byte array of size 32^3. Treat index of the array elements as RGB555 color.

On startup, for every element in the array convert RGB555 into your RGB888 structure, then use your current code to find closest matching color from your palette. 32^3=32768, not too many of them, and that code only runs once on startup. If the palette is known at compile time even better, compute that thing on your computer, and include the data in the executable.

That pre-computed 3D array allows you to find closest matching color with a few fast instructions and a single memory load:

static uint8_t s_lookup[ 32 * 32 * 32 ];

uint8_t getPaletteIndex( col c )
{
    size_t ir = (size_t)( c.r & 0xF8 ) << 7;
    size_t ig = (size_t)( c.g & 0xF8 ) << 2;
    size_t ib = (size_t)( c.b >> 3 );
    size_t idx = ir | ig | ib;
    return s_lookup[ idx ];
}

P.S. Some processors have BMI2 instruction set extension. On these processors, you can load color into uint32_t value, then _pext_u32 with the mask 0xF8F8F8 to make the index for that table.

Finding closest matching color from an array

2 Answers2