If I understood the requirement correctly, we need to:
- find for each grayscale image named XYZ that is in folder gray/...
- ...the matching color image named ABC that is in folder color/ and...
- ...copy ABC to folder results/ under the new name XYZ
So the basic algorithm I suggest is this:
Convert all images in folder color/ to grayscale and store result in folder gray-reference/. Keep the original names:
mkdir gray-reference
convert color/img123.jpg -colorspace gray gray-reference/img123.jpg
For each grayscale image in reference/ make a comparison with each grayscale image in folder gray/. If you find a match, copy the respective image of the same name from color/ to results/. One possible comparison command which creates a visual representation of differences is this:
compare gray-reference/img123.jpg gray/imgABC.jpg -compose src delta.jpg
The real trick is the comparison (as in step 2) of the two grayscale images. ImageMagick has a handy command to compare two (similar) images pixel by pixel and write the results into a 'delta' image:
compare reference.png test.png -compose src delta.png
If the comparison is for color images, in the delta image...
- ...each pixel that was equal appears in white, while...
- ...each pixel that was different appears in a highlight color (defaults to red).
See also my answer "ImageMagick: 'Diff' an Image" for an illustrated example of this technique.
If we directly compared a gray image with a color image pixel by pixel we would of course find that almost every single pixel is different (resulting in an all-red "delta" picture). Hence my proposal from step 1 above to first convert the color image to grayscale.
If we compare two grayscale images, the resulting delta image is in grayscale too. Hence the default highlight color can't be red. We better set it to 'black' in order to see it better.
Now if our current grayscale conversion of the color would result in a 'different' sort of grayscale than the one that the existing gray images have (our currently produced grays could just be slightly lighter or darker than the existing grayscale image due to different color profiles having been applied), it could still happen that our delta picture is all-"red", or rather all-highlight-color. However, I tested this with your sample images, and results are good:
convert color/image1.jpg -colorspace gray image1-gray.jpg
compare \
gray/file324.jpg \
image1-gray.jpg \
-highlight-color black \
-compose src \
delta.jpg
delta.jpg consists of 98% white pixels. I'm not sure if all the others of your thousands of grayscale images used the same settings when they were derived from the color originals. Therefor we add a small fuzz factor when running the compare
command, which does allow for some deviation in color when 2 pixels are compared:
compare -fuzz 3% reference.png test.png -compose src delta.png
Since this algorithm is to be executed many thousands of times (maybe several millions of times, given the number of images you talk about), we should make some performance considerations and we should time the duration of the compare
command. This is especially a concern, since your sample images are rather large (3072x2048 pixels -- 6 Mega-Pixels), and the comparison could take a while.
My timing results on a MacBook Pro where these:
time (convert color/image1.jpg -colorspace gray image1-gray.jpg ;
compare \
gray/file324.jpg \
image1-gray.jpg \
-highlight-color black \
-fuzz 3% \
-compose src \
delta100-fuzz.jpg)
real 0m6.085s
user 0m2.616s
sys 0m0.598s
6 seconds for: 1 conversion of a large color image to grayscale, plus 1 comparison of two large grayscale images.
You talked about 'thousands of images'. Assuming 3000 images, based on this timing, the processing of all the images would require (3000*3000)/2
comparisons (4.5 million) and (3000*3000*6)/2
seconds (27 million sec). That's a total of 312 days to complete all comparisons. Too long, if you ask me.
What could we do to improve the performance?
Well, my first idea is to reduce the size of the images. If we compare smaller images instead of 3072x2048 sized ones, the comparison should return the result faster. (However, we will also spend additional time for first scaling down of our test images -- but hopefully much less time than we later save when comparing the smaller images:
time (convert color/image1.jpg -colorspace gray -scale 6.25% image1-gray.jpg ;
convert gray/file324.jpg -scale 6.25% file324-gray.jpg ;
compare \
file324-gray.jpg \
image1-gray.jpg \
-highlight-color black \
-fuzz 3% \
-compose src \
delta6.25-fuzz.jpg)
real 0m0.670s
user 0m0.584s
sys 0m0.074s
That's much better! We shaved off almost 90% of processing time, which gives hope to complete the job in 35 days if you use a MacBook Pro.
The improvement is only logical: by reducing the image dimension to 6.25% of the original the resulting images are only 192x128 pixels -- a reduction from 6 million pixels to 24.5 thousand pixels, a ratio of 256:1.
(NOTE: The -thumbnail
and the -resize
parameters would work a little bit faster than -scale
does. However, this speed increase is a trade-off against quality loss. That quality loss would probably make the comparison much less reliable...)
Instead of creating a visually inspectable delta image from the compared images, we can tell ImageMagick to print out some statistics. To get the number of different pixels, we can use the AE
metric. The command with its results is this:
time (convert color/image1.jpg -colorspace gray -scale 6.25% image1-gray.jpg ;
convert gray/file324.jpg -scale 6.25% file324-gray.jpg ;
compare -metric AE file324-gray.jpg image1-gray.jpg -fuzz 3% null: 2>&1 )
0
real 0m0.640s
user 0m0.574s
sys 0m0.073s
This means we have 0
differing pixels -- a result that we could directly use inside a shell script!
Building blocks for a Shell script
So here are the building blocks for a shell script to do the automatic comparison:
Convert color images from 'color/' directory to grayscale ones, scale them down to 6.25% and save results in 'reference-color/' directory:
# Estimated time required to convert 1000 images of size 3072x2048:
# 500 seconds
mkdir reference-color
for i in color/*.jpg; do
convert "${i}" -colorspace gray -scale 6.25% reference-color/$(basename "${i}")
done
Scale down images from 'gray/' directory and save results in 'reference-gray/' directory:
# Estimated time required to convert 1000 images of size 3072x2048:
# 250 seconds
mkdir reference-gray
for i in gray/*.jpg; do
convert "${i}" -scale 6.25% reference-gray/$(basename "${i}")
done
Compare each image from directory 'reference-gray/' with images from directory 'reference-color' until a match is found:
# Estimated time required to compare 1 image with 1000 images:
# 300 seconds
# If we have 1000 images, we need to conduct a total of 1000*1000/2
# comparisons to find all matches;
# that is, we need about 2 days to accomplish all.
# If we have 3000 images, we need a total of 3000*3000/2 comparisons
# to find all matches;
# this requires about 20 days.
#
for i in reference-gray/*.jpg ; do
for i in reference-color/*.jpg ; do
# compare the two grayscale reference images
if [ "x0" == "x$(compare -metric AE "${i}" "${j}" -fuzz 3% null: 2>&1)" ]; then
# if we found a match, then create the copy under the required name
cp color/$(basename "${j}" results/$(basename "${i}") ;
# if we found a match, then remove the respective reference image (we do not want to compare again with this one)
rm -rf "${i}"
# if we found a match, break from within this loop and start the next one
break ;
fi
done
done
Caveat: Do not blindly rely on these building blocks. They are untested. I do not have a directory of multiple suitable images available to test this, and I do not want to create one myself just for this exercise. Proceed with caution!