When I used my simple C++ dithering on this input image(cropped from yours):

And use distinct colors as palette from this:

I got this output:

The squares in top are the palette used looks like more than 20 colors to me ... As you can see the output is much better than yours but your dithered result looks too pixelated is it downsampled and then zoomed?
Using just 2 shades of R,G,B,C,M,Y and 3 grayscales leads to 15 colors dithering:

The more shades and combinations of RGB the better the result... Here all the combinations of 3 shades 3^3=27 colors:

[Edit1] algorithm
handle each line of image separately
so process your image by horizontal or vertical lines. You need to have a r0,g0,b0
signed temp variable set to zero before processing any line and palette pal[]
holding your allowed colors.
for each pixel (of processed line)
extract its r,g,b
and add it to r0,g0,b0
, then find closest color to (r0,g0,b0)
in your palette pal[]
and substract chosen r,g,b
from r0,g0,b0
.
something like this:
for (y=0;y<height;y++)
{
r0=0; g0=0; b0=0;
for (x=0;x<width;x++)
{
// pick and add wanted color
col=pixel(x,y);
r0+=col.r;
g0+=col.g;
b0+=col.b;
// chose closest color
ix=0; col=(r0,g0,b0);
for (i=0;i<palette_size;i++)
if (color_distance(col,pal[i])<color_distance(col,pal[ix]))
ix=i;
col=pal[ix];
// render and substract chosed color
pixel(x,y)=col;
r0-=col.r;
g0-=col.g;
b0-=col.b;
}
}
The choosing of closest color can be significantly speed up by LUT[r][g][b]
table.
This approach is fast and simple but far from best visually.