0

Say I have a chart such as this one, as an image:

enter image description here

I want to extract its colors and find the closest color available in grDevices::colors() and that can be seen here

head(grDevices::colors())
[1] "white"         "aliceblue"     "antiquewhite"  "antiquewhite1" "antiquewhite2" "antiquewhite3"

The simplest output would be a vector of these colors.

A fancier output would be a data.frame with the real color codes, the "rounded" color (i.e. part of grDevices::colors()) , the percentage of image surface it covers, and the coordinates of centers of gravity of its covered areas.

A super fancy output would overlay these color names over the original chart, or/and build a new dot chart that with dots placed at these center positions and color names as text labels.

An ultra fancy output would propose the closest match among existing palettes.

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
  • https://stackoverflow.com/a/16788397/1412059 – Roland Jul 19 '18 at 10:22
  • I suppose the downvoter won't come back but if something in my question isn't clear I'll be happy to edit. It was possibly because I proposed several outputs, but I did it because I want to leave the possibility open for an really good general answer to the issue. I'll accept the simplest answer if it's the best provided and it works. – moodymudskipper Jul 19 '18 at 10:29

1 Answers1

0

tldr : get_named_colors("https://i.stack.imgur.com/zdyNO.png") using the function defined at the bottom.


We will load the image in R, convert it to long rgb format, get the rgb values of the named colors as well and put them in the same format, then compute all relevant distances and keep the minimum for each of the colors of our image, from there we get our output.

library(ggplot2)
library(dplyr)      
library(png)

our candidates :

rgb_named_colors <- t(col2rgb(grDevices::colors())/255)
head(rgb_named_colors,3)
#            red     green      blue
# [1,] 1.0000000 1.0000000 1.0000000
# [2,] 0.9411765 0.9725490 1.0000000
# [3,] 0.9803922 0.9215686 0.8431373

our colors :

img     <- readPNG("https://i.stack.imgur.com/zdyNO.png")
dim(img) # [1] 476 746   3
# it's a 3d matrix, let's convert it to long format
rgb_img <- apply(img,3,c)
colnames(rgb_img) <- c("red","green","blue")
head(rgb_img,3)
# red     green      blue
# [1,] 0.9803922 0.9803922 0.9803922
# [2,] 0.9803922 0.9803922 0.9803922
# [3,] 0.9803922 0.9803922 0.9803922

dim(unique(rgb_img)) # [1] 958   3

We have 958 colors, it's a bit much, we need to filter out those with low occurences, we set a cutoff to 0.5% of img pixels.

rgb_img_agg <-
  rgb_img %>%
  as_tibble %>%
  group_by_all %>%
  count %>%
  filter(n > dim(img)[1]* dim(img)[2] *0.5/100)

How did it work out ?

dim(rgb_img_agg) # [1] 11  4

much better.

head(rgb_img_agg,3)
# # A tibble: 3 x 4
# # Groups:   red, green, blue [3]
#          red     green      blue     n
#        <dbl>     <dbl>     <dbl> <int>
# 1 0.04705882 0.2627451 0.5137255  2381
# 2 0.27843137 0.5568627 0.7803922 29353
# 3 0.37254902 0.7450980 0.2549020  2170

for all of the image colors we compute the distance to named colors and keep the min

output <- apply(rgb_img_agg[1:3],1, function(row_img)
  grDevices::colors()[which.min(
  apply(rgb_named_colors,1,function(row_named)
    dist(rbind(row_img,row_named))))])

ouput
# [1] "dodgerblue4" "steelblue3"  "limegreen"   "olivedrab"   "gray80"      "olivedrab1"  "chocolate3"  "chocolate1" 
# [9] "ghostwhite"  "gray98"      "white" 

It works! now let's display all of our colors with a legend:

ggplot(tibble(named_color=output),aes(named_color,fill=factor(named_color,levels=output))) + geom_bar() +
    scale_fill_manual(values = output)

Now we put everything in a function :

get_named_colors <- function(path, cutoff = 0.5){
  library(dplyr)
  library(ggplot2)
  library(png)
  # named colors
  rgb_named_colors <- t(col2rgb(grDevices::colors())/255)

  # colors from path
  img     <- readPNG(path)
  rgb_img <- apply(img,3,c)
  colnames(rgb_img) <- c("red","green","blue")
  rgb_img_agg <-
    rgb_img %>%
    as_tibble %>%
    group_by_all %>%
    count %>%
    filter(n > dim(img)[1]* dim(img)[2] *cutoff/100)

  # distances
  output <- apply(rgb_img_agg[1:3],1, function(row_img)
    grDevices::colors()[which.min(
      apply(rgb_named_colors,1,function(row_named)
        dist(rbind(row_img,row_named))))])

  p <- ggplot(tibble(named_color=output),aes(named_color,fill=factor(named_color,levels=output))) + geom_bar() +
    scale_fill_manual(values = output)
  print(p)

  output
}

I might update if I found out how to implement the fancy features.

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
  • It works ok but the color that is found is not 100% the closest visually from my subjective observations. I suspect Euclidian distance is not the best to appreciate the difference between colors, we should use a distance function tailored for the average human eye. – moodymudskipper Jul 19 '18 at 12:33
  • As explained here: https://stackoverflow.com/questions/9018016/how-to-compare-two-colors-for-similarity-difference, the RGB space is not perceptually uniform, a solution is to map these values to a perceptually uniform space and compute distances from there. This package will help: https://cran.r-project.org/web/packages/colorspace/colorspace.pdf – moodymudskipper Jul 19 '18 at 12:40