0

I have an image master.png and more than 10.000 of other images (slave_1.png, slave_2.png, ...). They all have:

  • The same dimensions (Eg. 100x50 pixels)
  • The same format (png)
  • The same image background

98% of the slaves are identical to the master, but 2% of the slaves have a slightly different content:

  • New colors appear
  • New small shapes appear in the middle of the image

I need to spot those different slaves. I'm using Ruby, but I have no problem in use a different technology.

I tried to File.binread both images and then compare using ==. It worked for 80% of the slaves. In other slaves, it was spotting changes but the images was visually identical. So it doesn't work.

Alternatives are:

  1. Count the number of colors present in each slave and compare with master. It will work in 100% of the time. But I don't know how to do it in Ruby in a "light" way.
  2. Use some image processor to compare by histograms like RMagick or ruby-vips8. This way should also work but I need to consume the less CPU/Memory possible.
  3. Write a C++/Go/Crystal program to read pixel by pixel and return a number of colors. I think in this way we can get performance out of if. But for sure is the hard way.

Any enlightenment? Suggestions?

sawa
  • 165,429
  • 45
  • 277
  • 381
fschuindt
  • 821
  • 1
  • 9
  • 23
  • 1
    Look into [this question](http://stackoverflow.com/questions/4196453/simple-and-fast-method-to-compare-images-for-similarity). Many options have been discussed there. – Uzbekjon Apr 14 '16 at 17:18
  • Another note about comparing with `File.binread`. Since you are simply comparing file contents and resources and performance of an importance, then it'd be better to simply use bash to do that. Look into: `diff`, `cmp` or `md5`. – Uzbekjon Apr 14 '16 at 17:50
  • Could be a job for [Tensor Flow](https://www.tensorflow.org) if you need a classifier. – tadman Apr 14 '16 at 19:08
  • Do you really mean you don't want to use much CPU when you say you want to do it in a light way? Or do you mean you want the answer fast - which may mean using all the CPU for a time? – Mark Setchell Apr 17 '16 at 20:53
  • @MarkSetchell By "light" I mean using the less CPU/RAM possible. – fschuindt Apr 18 '16 at 13:47
  • How about showing a master and a couple of slaves plus a *different* slave? – Mark Setchell Apr 22 '16 at 11:37

1 Answers1

1

In ruby-vips, you could do it like this:

#!/usr/bin/ruby

require 'vips'

# find normalised histogram of reference image
ref = Vips::Image.new_from_file ARGV[0], access: :sequential 
ref_hist = ref.hist_find.hist_norm

ARGV[1..-1].each do |filename|
    # find sample hist
    sample = Vips::Image.new_from_file filename, access: :sequential 
    sample_hist = sample.hist_find.hist_norm

    # calculate sum of squares of differences; if it's over a threshold, print
    # the filename
    diff_hist = (ref_hist - sample_hist) ** 2
    diff = diff_hist.avg * diff_hist.width * diff_hist.height

    if diff > 100
        puts "#{filename}, #{diff}"
    end
end

If I make some test data:

$ vips crop ~/pics/k2.jpg ref.png 0 0 100 50
$ for i in {1..10000}; do cp ref.png $i.png; done

I can run it like this:

$ time ../similarity.rb ref.png *.png
real    0m55.974s
user    1m31.921s
sys 0m54.433s

It runs in a steady ~80mb of memory.

jcupitt
  • 10,213
  • 2
  • 23
  • 39