I would use HSV color space
it is better to detect colors (more human perception like) that should help a lot. You can also use HSV Histogram to detect how many distinct colors you have.

If you still want RGB than compare differently
You have pen color0=(r0,g0,b0)
and pixel color=(r,g,b)
so compute distance between them:
d=((r-r0)*(r-r0))+((g-g0)*(g-g0))+((b-b0)*(b-b0))
No need for sqrt
. Now you just compute d
for every color you have (pens) and choose the smallest d
... You can also use less precise:
d=abs(r-r0)+abs(g-g0)+abs(b-b0)
If you do not know the colors prior this and do not want to use histograms
- form a (re)color table (set of distinct visible colors you will set to each found new pen)
- create empty list of found colors
- process all pixels of input image
- compute distance
d
to all found colors in a list
- if
d
is smaller then some treshold constant the pixels belong to that color in found colors list. Else add it as new found color.
- recolor pixel with color from recolor table.
This will eliminate the shading and anti-aliasing color distortions. You can also ignore the recolor table and use the color from the found colors list. This process is form of Color Quantization.
[Edit1] After using HSV color and recoloring to found color list (no histogram) I got this result:

This shows that your image has not the same lighting conditions (not a render but real photo). So Ilumination normalization should improve this even more. Also I use 2 tresholds one for gray scales and one for colors ... to distinguish the two ... Also you can detect background color by:
- pixel count (should be much bigger then the color of text)
- dispersion along the image (should cover large area with relatively high density uniformly dispersed ... Text is localized)
Here C++/VCL source for this:
backbuffer bmp; // source and target image
struct _color { DWORD rgb; int h,s,v; }; // color entry in (re)color table
_color ld_rgb(DWORD rgb) // just RGB -> HSV conversion
{
const int _b=0;
const int _g=1;
const int _r=2;
const int _a=3;
union { DWORD dd; BYTE db[4]; } c;
double r,g,b,min,max,del,h,s,v,dr,dg,db;
c.dd=rgb;
r=c.db[_r]; r/=255.0;
g=c.db[_g]; g/=255.0;
b=c.db[_b]; b/=255.0;
min=r; if (min>g) min=g; if(min>b) min=b;
max=r; if (max<g) max=g; if(max<b) max=b;
del=max-min;
v=max;
if (del<=0.1) { h=0; s=0; } // grayscale
else{
s=del/max;
dr=(((max-r)/6.0)+(del/2.0))/del;
dg=(((max-g)/6.0)+(del/2.0))/del;
db=(((max-b)/6.0)+(del/2.0))/del;
if (fabs(r-max)<1e-10) h=db-dg;
else if (fabs(g-max)<1e-10) h=(1.0/3.0)+dr-db;
else if (fabs(b-max)<1e-10) h=(2.0/3.0)+dg-dr;
if (h<0.0) h+=1.0;
if (h>1.0) h-=1.0;
}
_color ccc;
ccc.rgb=rgb;
ccc.h=255.0*h;
ccc.s=255.0*s;
ccc.v=255.0*v;
return ccc;
}
void recolor() // this is the recolor you want
{
// load input jpg file to bmp image
TJPEGImage *jpg=new TJPEGImage();
jpg->LoadFromFile("in.jpg");
bmp.bmp->Assign(jpg);
bmp.resize(bmp.bmp->Width,bmp.bmp->Height);
delete jpg;
// recolor bmp
int i,x,y,d;
_color c0,c1;
List<_color> col; // color list
col.num=0; // clear colro list
for (y=0;y<bmp.ys;y++) // process all pixels
for (x=0;x<bmp.xs;x++)
{
c0=ld_rgb(bmp.pyx[y][x]); // pixel color -> hsv
if ((c0.h==0)&&(c0.s==0)) // compare it to found colors (grayscales)
for (i=0;i<col.num;i++)
{
// i=-1; c1.rgb=0x00202020; break;
c1=col[i];
if ((c1.h!=0)||(c1.s!=0)) continue;
d=abs(c1.v-c0.v);
if (d<32) { i=-1; break; } // match found ?
}
else // compare it to found colors
for (i=0;i<col.num;i++)
{
// i=-1; c1.rgb=0x0000FF00; break;
c1=col[i];
if ((c1.h==0)&&(c1.s==0)) continue;
d=(abs(c1.h-c0.h))+(abs(c1.s-c0.s));
if (d<50) { i=-1; break; } // match found ?
}
if (i>=0) { c1=c0; col.add(c1); } // if not add new color
bmp.pyx[y][x]=c1.rgb; // recolor;
}
bmp.bmp->Canvas->Brush->Style=bsClear;
bmp.bmp->Canvas->Font->Color=0x00802040;
bmp.bmp->Canvas->TextOutA(5,0,"Found colors: "+AnsiString(col.num));
bmp.bmp->Canvas->Brush->Style=bsSolid;
for (d=16,i=0;i<col.num;i++)
for (y=d;y<=d+d;y++)
for (x=d*i+1;(x<d*i+d)&&(x<bmp.xs);x++)
bmp.pyx[y][x]=col[i].rgb;
}
List<T> l;
is dynamic array like std::vector<T>
... represents T l[l.num];
backbuffer bmp;
is mine image class ... bmp.bmp
Holds GDI bitmap and bmp.xs,bmp.ys
is the resolution
col
holds found colors ...
[Edit1] bi-cubic Illumination normalization
I recently was rewriting my DIP lib upgrading mine illumination normalization so I give a shot at your input image (as one of many test images) and here the result (with forced (detected) empty space recolor):

As you can see the middle red-ish lighting spot is gone. You can try your algo on this so you know if applying illumination normalization helps before encoding it (it is a bit complicated if done properly). This one is done like this:
create grid (table) for your image
each cell contains average color and cumulative delta (noise) of the cell area. Also single flag telling if cell is paper or ink. Cell size should be around <0.5 - 1.5>
of min detail size (like letter or pen width ...)
set all cells with high delta as ink the rest as paper
- compute average color of all paper cells combined
each paper cell neighboring ink cell
set as ink if its average color is too far from the global average paper color. Be carefull not to take those newly ink set cells as neighbor condition for this step. (use temp flag or different flag bit and restore after this is done...
find 16 control points uniformly dispersed along image (use only paper cells)
they should be around coordinates 0%,33%,66%,100%
of image resolution So the bi-cubic interpolation is valid.
for each pixel
bi-cubically compute the cell color and call it c0
Then apply normalization to pixel (in RGB space!!!):
pixel+=global_avg_color-c0
;
This will equalize the paper color along the whole image to very close match to global_avg_color
. Leaving the non paper details undistorted.
optionally recolor all paper cells area with global_avg_color
This is not necessary but it will eliminate much of the noise from background. Like paper texture ...