0

I am extracting primitives from pixel-based line diagrams and wish select by colour. Thus in the following

enter image description here

I wish to extract the "blue", the "green" and the "black" primitives. (I am prepared to try to reconstruct primitives which have been split by primitives of another colour).

However the "blues" have a varying amount of white added (similar to a gray scale for black). Thus the commonest colours (rounded to 12-bit for simplicity) with their counts might be

000   881 // black
88f   1089 // white-blue
fff   70475 // white

but there are other degrees of whiteness at lower frequency

// other white-blue 
99f   207

// other grey
ddd   196

I believe that the authors will have used only a very limited number of pure colours (e.g. 3-6) in many diagrams and that various rendering tools will have added the white. IOW the colours can be expressed by (0 =< x =< 1)

000 + x(FFF)
00F + x(FF0) // blue
0F0 + x(F0F) // green

However there is no requirement to use primary colours and the set could be any colour with arbitrary amounts of white.

How can I reconstruct the (small) set of different colours? If this is possible I can then select those regions, transform to grey, and binarize in the normal way.

I'd prefer source in Java but I suspect that any code will be adequate;

I have read two useful SO questions

"Rounding" colour values to the nearest of a small set of colours

HCL color to RGB and backward

which use H-C-L and might be a way forward although they don't directly answer my requirements.

Community
  • 1
  • 1
peter.murray.rust
  • 37,407
  • 44
  • 153
  • 217

3 Answers3

1

You could try using region growing. I think it should fit your needs well. Just change the threshold for when it's the same color. I think it should work well here since there seems to be a big difference between any two colors that are connected as objects.

Community
  • 1
  • 1
wbest
  • 611
  • 1
  • 6
  • 15
  • It may be useful (and I do already grow regions) But there's a lot of variation in the amount of white in the blue - I'd really like to binarize the blue as a distinct operation – peter.murray.rust Jan 29 '14 at 16:51
  • Are you able to threshold out all the pixels you definitely don't want (for example all the fully white pixels)? You could try something like dynamic clustering on all the non-white pixels over the entire image. This would group all the similar colors together, and should work automatically without your need to decide on a threshold. – wbest Jan 29 '14 at 17:00
  • we can't rely on the colours being in blocks. It could be much finer grained - e.g. antialiased characters – peter.murray.rust Jan 29 '14 at 17:11
  • A kmeans or clustering approach would work even if the pixels where randomly distributed. You probably wouldn't even have to threshold out the white and black data. Do you know how many colors or which colors you want to segment out? – wbest Jan 29 '14 at 17:34
  • No. I know there are probably not many but I don't know what they are. Without the whitening it's easy - just compile a Set. But since they can vary I need to know how – peter.murray.rust Jan 29 '14 at 17:54
  • Is the set of possible primitives constant? Will you ever have colors like "Dark Blue" and "Light Blue" used to represent two different things in the same image? Or can we rely on the idea that all "bluish" colors are the same? – wbest Jan 30 '14 at 15:56
0

If your intuition is correct (all pixels being a linear mixture of some color and pure white), in the RGB cube all colors will be aligned on line segments originating from the white corner.

If you pick one representative pixel per different color (as far as possible from white, for better accuracy), you can identify the color of any other pixel by finding the best alignment formed by this pixel, by white and by the representative pixels.

Alignment is tested by computing the cosine of the angle formed (use 3D vectors, the cosine is the dot product over the product of the norms; drop the sign). In theory the cosine should be exactly 1, but due to numerical errors it can be smaller, so just consider the representative color that maximizes the cosine.

Take special care of the white pixels (short distance to the white corner), otherwise they will be randomly assigned to some representative color.

  • Thx. I've realised that without knowledge of the antialiasing algorithm there is probably no exact answer. I think that white is being added to a base colour but H and S can both vary – peter.murray.rust Jan 31 '14 at 15:05
0

Depending on the number of colors involved and their similarity, a simple threshold of the R, G, and B values would quickly reduce everything to one of 8 colors (black, red, green, blue, cyan, magenta, yellow, or white).

Dithermaster
  • 6,223
  • 1
  • 12
  • 20